sanitized_csv

Go

📂 CSV/Excel Data Sanitization Tool

Go-based tool for automated CSV/Excel processing with row filtering and organized output.

Workflow

✨ Key Features

  • Auto Conversion: Transforms Excel files (.xlsx/.xls) to CSV
  • Smart Filtering: Removes rows with empty specific columns
  • Structured Output: Organizes results in file-specific folders
  • Multi-file Support: Processes multiple files via .env config
  • Detailed Logging: Tracks all operations with severity levels

⚡ Prerequisites

  • Go 1.16+
  • Excelize v2:
    go get github.com/xuri/excelize/v2

🔧 Configuration (.env)

# Base directories
DATA_OUTPUT_DIR=data  # Processed output folder
DOCS_DIR=docs         # Source files folder

# File configuration (default for 3 files)
FILE_1=report1.xlsx
EXCLUDE_ROW_FILE_1=Sale_ID  # Column to filter empty rows

FILE_2=customers.csv
EXCLUDE_ROW_FILE_2=SSN

FILE_3=products.xls
EXCLUDE_ROW_FILE_3=Internal_Code

🚀 Basic Usage

  1. Place files in
    docs
    folder
  2. Configure .env variables
  3. Run:
go run main.go

# Expected output:
✅ processed docs/report1.xlsx → data/report1/sanitized_report1.csv
✅ processed docs/customers.csv → data/customers/sanitized_customers.csv
⏭️  skipping products.xls (unsupported extension)

🗂 Output Structure

data/
├── report1/
│   ├── report1.csv         # Converted CSV
│   └── sanitized_report1.csv  # Filtered data
└── customers/
    ├── customers.csv
    └── sanitized_customers.csv

🔄 Processing Flow

graph TD
    A[Start] --> B[Load .env]
    B --> C[Create output dirs]
    C --> D[Process each FILE_X]
    D --> E{Is Excel?}
    E -->|Yes| F[Convert to CSV]
    E -->|No| G[Copy CSV]
    F/G --> H[Filter rows]
    H --> I[Generate sanitized file]

🛑 Error Handling

Common Error | Solution --- | ---

DOCS_DIR is not set
| Set variable in .env
Unsupported extension
| Use .csv, .xlsx or .xls
Columns mismatch
| Verify file headers
Permission denied
| Adjust folder permissions

🔄 Customization

1. Add More Files
Edit .env:

FILE_4=new_data.xlsx
EXCLUDE_ROW_FILE_4=Expiry_Date

2. Modify Filter Logic
Adjust

filterCSV()
for different conditions:

// Example: filter specific values instead of empty
if row[idx] == "DELETE" {
    continue
}

3. Add New Formats
Extend

processFile()
for other types:

case ".ods":
    err = convertODSToCSV(inputPath, origPath)

📄 License

MIT License - See LICENSE for details.


Optimizations:

  • Parallel file processing
  • Automatic encoding detection
  • Processed files versioning
  • Execution metrics tracking

Note: For Portuguese version, see README_PT-BR.md