📂 CSV/Excel Data Sanitization Tool
Go-based tool for automated CSV/Excel processing with row filtering and organized output.
✨ Key Features
- Auto Conversion: Transforms Excel files (.xlsx/.xls) to CSV
- Smart Filtering: Removes rows with empty specific columns
- Structured Output: Organizes results in file-specific folders
- Multi-file Support: Processes multiple files via .env config
- Detailed Logging: Tracks all operations with severity levels
⚡ Prerequisites
- Go 1.16+
- Excelize v2:
go get github.com/xuri/excelize/v2
🔧 Configuration (.env)
# Base directories DATA_OUTPUT_DIR=data # Processed output folder DOCS_DIR=docs # Source files folder # File configuration (default for 3 files) FILE_1=report1.xlsx EXCLUDE_ROW_FILE_1=Sale_ID # Column to filter empty rows FILE_2=customers.csv EXCLUDE_ROW_FILE_2=SSN FILE_3=products.xls EXCLUDE_ROW_FILE_3=Internal_Code
🚀 Basic Usage
- Place files in folder
docs
- Configure .env variables
- Run:
go run main.go # Expected output: ✅ processed docs/report1.xlsx → data/report1/sanitized_report1.csv ✅ processed docs/customers.csv → data/customers/sanitized_customers.csv ⏭️ skipping products.xls (unsupported extension)
🗂 Output Structure
data/ ├── report1/ │ ├── report1.csv # Converted CSV │ └── sanitized_report1.csv # Filtered data └── customers/ ├── customers.csv └── sanitized_customers.csv
🔄 Processing Flow
graph TD A[Start] --> B[Load .env] B --> C[Create output dirs] C --> D[Process each FILE_X] D --> E{Is Excel?} E -->|Yes| F[Convert to CSV] E -->|No| G[Copy CSV] F/G --> H[Filter rows] H --> I[Generate sanitized file]
🛑 Error Handling
Common Error | Solution --- | ---
DOCS_DIR is not set
Unsupported extension
Columns mismatch
Permission denied
🔄 Customization
1. Add More Files
Edit .env:
FILE_4=new_data.xlsx EXCLUDE_ROW_FILE_4=Expiry_Date
2. Modify Filter Logic
Adjust
filterCSV()
// Example: filter specific values instead of empty if row[idx] == "DELETE" { continue }
3. Add New Formats
Extend
processFile()
case ".ods": err = convertODSToCSV(inputPath, origPath)
📄 License
MIT License - See LICENSE for details.
Optimizations:
- Parallel file processing
- Automatic encoding detection
- Processed files versioning
- Execution metrics tracking
Note: For Portuguese version, see README_PT-BR.md