Data Pipelines
How YouTube, time-management, migration, and CMS data become runtime artifacts.
Data pipelines
The site keeps expensive parsing out of runtime pages. Raw exports are processed offline into compact JSON files under public/data/, and client experiences read those artifacts directly.
Pipeline overview
| Experience | Raw/source data | Processor | Runtime artifact |
|---|---|---|---|
| YouTube Scholar | public/data/Youtube_Data/ Takeout files | scripts/process-youtube-data.js | public/data/youtube-scholar.json |
| Curiosity Velocity | YouTube Takeout history/channel data | scripts/process-youtube-data.js | public/data/youtube-curiosity-velocity.json |
| Time Management | Screen-time spreadsheets under public/data/time-management/ | scripts/time-management/*.py | public/data/time-management/analysis.json, activities_data.json |
| Bird Migration | public/data/bird_migration.csv | parser utilities and committed processed data | public/data/migration_lite.json |
| Blog/ranked reviews | WordPress CMS | GraphQL fetches | Static route output |
YouTube data
Run:
bashnpm run youtube:dataThis calls scripts/process-youtube-data.js. Keep raw Takeout files and generated artifacts clearly separated. If adding support for another Takeout export shape, normalize it in the script rather than branching throughout React components.
Time-management data
The time-management pipeline is Python-based:
bashpython3 scripts/time-management/explore_data.py
python3 scripts/time-management/merge_activities.py
python3 scripts/time-management/analyze_time_usage.py
python3 scripts/time-management/generate_detailed_analysis.pyUse the generated JSON files from public/data/time-management/ in the dashboard. Do not make the dashboard parse .xlsx files at runtime.
Static artifact rules
- Commit small, public, demo-safe generated JSON when it is needed by static pages.
- Do not commit private raw exports unless the repo is explicitly meant to publish them.
- Prefer compact arrays or pre-aggregated summaries for visualization payloads.
- Keep schema notes in this document when adding a new generated artifact.
Adding a new data story
- Create a route under
src/app/projects/<story>/orsrc/app/blog/post/interactive/<story>/. - Add processing code under
scripts/<story>/. - Write generated runtime artifacts under
public/data/<story>/. - Keep raw inputs out of UI code.
- Document the input format, command, and output file here.
Privacy boundary
Personal exports can reveal browsing, watch history, routines, and preferences. Treat raw exports as private by default. A public artifact should be either anonymized, aggregated, or intentionally published as part of the narrative.