This project demonstrates how to manipulate CSV files using Python. It includes scripts to merge multiple CSV files and to remove duplicates from a CSV file.
csv-manipulations-python/
│
├── csv/ # Folder containing CSV files
│ ├── data1.csv
│ ├── data2.csv
│ └── data3.csv
│
├── out/ # Folder for output files
│ ├── cleaned_output.csv
│ └── merged_output.csv
│
├── src/ # Source code folder
│ ├── merge_csv_files.py
│ └── remove_duplicates_from_csv.py
│
├── README.md # Project documentation
└── requirements.txt # Dependencies
Make sure you have Python installed. You can download it from python.org.
Install the required Python packages using pip
:
pip install -r requirements.txt
The merge_csv_files.py
script merges all CSV files in the csv
folder into a single file.
Command:
python src/merge_csv_files.py
Output:
The merged output will be saved to out/merged_output.csv
.
The remove_duplicates_from_csv.py
script removes duplicates from the specified CSV file.
Command:
python src/remove_duplicates_from_csv.py
Output:
The cleaned file with duplicates removed will be saved to out/cleaned_output.csv
.
merge_csv_files.py
This script reads all CSV files from the specified folder, merges them into a single DataFrame, and saves the merged DataFrame to an output CSV file.
Function: merge_csv_files
def merge_csv_files(folder_path, output_file):
"""
Merges all CSV files in the specified folder into a single DataFrame and saves to output_file.
Parameters:
folder_path (str): The path to the folder containing the CSV files.
output_file (str): The path to the output CSV file.
"""
remove_duplicates_from_csv.py
This script removes duplicate rows from the specified CSV file and saves the cleaned DataFrame to an output CSV file.
Function: remove_duplicates
def remove_duplicates(input_file, output_file):
"""
Removes duplicates from the specified CSV file and saves the cleaned data to output_file.
Parameters:
input_file (str): The path to the input CSV file.
output_file (str): The path to the output CSV file.
"""
If you would like to contribute to this project, please fork the repository and submit a pull request.
This project is licensed under the MIT License.
© 2024 Lahcen Ezzara. All rights reserved.
This project is licensed under the MIT License.