Data Pipeline Overview
Proyek ini mengikuti pipeline 4 langkah: Collect โ Validate โ Clean โ Analyze. Tiga file sumber dikonsolidasikan menjadi satu dataset bersih untuk analisis.
| Step | Source | Input | Output |
|---|---|---|---|
| 1 | Sales Transactions | Raw transaction data (CSV) | 2,547 raw rows |
| 2 | Sales Targets | Target per channel/month | Target benchmarks |
| 3 | Customer Feedback | Survey responses | 899 feedback entries |
| 4 | Cleaning & Analysis | All 3 sources | 2,460 clean rows + insights |
Anomaly Summary
Dari 2.547 baris raw data, ditemukan beberapa jenis anomali yang harus dibersihkan sebelum analisis:
| Issue Type | Count | Action Taken |
|---|---|---|
| Duplicate rows | 24 | Removed exact duplicates |
| Negative values | 15 | Converted to absolute values |
| Missing fields | 18 | Imputed or removed |
| Format inconsistency | 20 | Standardized date/text formats |
| Outliers (price zero) | 10 | Removed zero-price transactions |
โ ๏ธ Data Quality Note
Total 87 rows were affected by anomalies (3.4% of raw data). After cleaning, 2,460 clean rows remained โ a 96.6% retention rate. This is within acceptable range for multi-source consolidated data.
Cleaning Process
Proses cleaning dilakukan secara sistematis dengan urutan prioritas:
- Remove duplicates โ identifikasi dan hapus baris yang persis sama
- Fix negative values โ konversi nilai negatif di kolom quantity/revenue ke absolut
- Handle missing data โ imputasi untuk field non-critical, hapus untuk field critical
- Standardize formats โ unifikasi format tanggal, nama channel, kategori produk
- Remove outliers โ hapus transaksi dengan harga 0 atau nilai yang tidak masuk akal
Tools & Stack
| Tool | Purpose |
|---|---|
| Microsoft Excel | Data cleaning, pivot tables, formula-based analysis |
| Google Sheets | Collaborative review, sharing with stakeholders |
| HTML/CSS/JS | Portfolio presentation (this website) |
๐ Reproducibility
All cleaning steps are documented in the Excel workbook with formulas (not hardcoded values). The analysis can be reproduced by anyone with access to the original 3 source files.