Raw API Ingestion Pipeline
Raw API Ingestion Pipeline is a raw ingestion project for external operational APIs. It runs scheduled API calls, keeps transformation intentionally thin at the ingestion boundary, and stores the original API response inside Supabase raw tables for later SQL modeling, analytics, and BI layers.
Objective
The main goal is to decouple API collection from analytical modeling. Instead of writing directly to a dashboard-ready table or spreadsheet, each worker stores the raw payload with an idempotent identifier, a destination table, and enough metadata to support reprocessing.
- Collect third-party API data on a schedule.
- Persist reliable raw records in Supabase/PostgreSQL.
- Keep historical payloads reprocessable without another external API call.
- Let SQL, BI, and operational models evolve outside the ingestion layer.
Architectural Principles
- The saved
payloadremains raw. - The ingestion envelope can vary:
external_id, destination table, collection window, logs, and scheduling. - The raw table mirrors the source API. It is not the business model.
- Logs expose operational metadata, not secrets or full sensitive payloads.
Why GitHub Actions and Supabase
For lightweight ingestion, GitHub Actions plus Supabase removes the need for a permanent worker or VM. Scheduled workflows provide auditability, manual replay with workflow_dispatch, centralized GitHub Secrets, and simple cost control while Supabase acts as a durable SQL storage layer for raw records.
SQL and Raw Data Strategy
The important SQL design decision is to store raw API outputs first and derive normalized models later. Tables such as raw_contact_hablla, raw_events_hablla, raw_contact_telefonia, raw_events_faturado, raw_contact_site, and raw_events_agendamento can support downstream SQL views, materialized tables, joins, and BI-friendly datasets without losing the original provider shape.
The external_id field makes repeated collection windows safe: the same five, seven, fifteen, or monthly period can be replayed without duplicating rows when upsert logic is applied consistently.
Current Integrations
- Hablla Clients: persons endpoint to
raw_contact_hablla, usingclient-{id}. - Hablla Cards: cards endpoint to
raw_events_hablla, with a default recent window. - Hablla Attendants: services summary to
raw_cs_avaliacao_atendimento, collected day by day. - Zenvia Calls: voice call reports to
raw_contact_telefonia, generally suited to daily close-of-day runs. - SIGE Faturamento: billed orders to
raw_events_faturado, usingpedido-{Codigo}. - Zoho Leads Full and Recent: Creator lead reports to
raw_contact_site. - Zoho Scheduling Full and Recent: scheduling reports to
raw_events_agendamento.
Scheduling Model
The project mixes frequent short windows with less frequent larger windows. Hablla clients/cards and recent Zoho jobs run several times per day. Zenvia, SIGE, Hablla attendants, and full Zoho loads run daily. This reduces API pressure, keeps fresh operational data available, and lowers the risk of workflow overlap.
Security and Logs
Because workflows can expose public run metadata, error logs are sanitized. The project avoids logging tokens, secrets, complete raw payloads, and full provider error responses. Logs focus on status codes, summarized messages, and operational context that can be debugged without leaking credentials.
Migration From Google Sheets
The older pattern synchronized API data directly into Google Sheets after early mapping and filtering. Raw API Ingestion Pipeline moves the storage boundary earlier: raw API payloads are captured in Supabase first, then SQL modeling and analytics happen downstream. This improves auditability, replayability, and resilience against changing business rules.