Data Engineering

Raw API Ingestion with GitHub Actions, Supabase, and SQL

Raw ingestion architecture is a small design choice with large consequences. Raw API Ingestion Pipeline keeps third-party API payloads intact in Supabase raw tables, then lets SQL-derived layers, BI models, and operational dashboards evolve without forcing another expensive API collection.

Published May 15, 2026 10 min read API Data Architecture

The ingestion boundary matters

Many API integration projects start by mapping provider data directly into a report shape. That feels efficient at first, but it couples collection to the current business question. When the dashboard changes, the API shape changes, or a new SQL model is needed, the system may have to pull old data again from the provider.

Raw API Ingestion Pipeline takes the opposite approach. Workers collect raw payloads from Hablla, Zoho Creator, Zenvia Voice, and SIGE ERP APIs, then store them in Supabase/PostgreSQL raw tables. The ingestion layer only wraps the payload with operational metadata such as external_id, destination table, run window, and timestamps.

A raw table should preserve evidence. It is not the final business model; it is the replayable source that lets future SQL models stay honest.

Why raw_* SQL tables are useful

Supabase makes PostgreSQL available as an operational storage layer, so each integration can land in a predictable SQL namespace. Tables like raw_contact_hablla, raw_events_hablla, raw_contact_telefonia, raw_events_faturado, raw_contact_site, and raw_events_agendamento are intentionally source-oriented.

This gives the system three advantages. First, old payloads can be reprocessed into new reporting tables. Second, SQL joins and enrichments can be tested without asking the external API for the same data again. Third, schema drift becomes visible in the raw layer instead of being silently flattened by early mapping logic.

Idempotent external IDs make replay safe

Scheduled ingestion nearly always repeats windows. A worker may fetch the last five days of SIGE orders, the last seven days of Hablla cards, the current and previous month of Zoho scheduling records, or a recent window of Zenvia calls. Repetition is healthy because it catches late updates, but only if writes are idempotent.

Raw API Ingestion Pipeline uses stable source-derived identifiers such as client-{id}, card-{id}, pedido-{Codigo}, lead-{ID}, and agendamento-{ID}. With upsert semantics, the same window can run again without duplicating rows. That is the difference between a replayable ingestion system and a growing pile of duplicate records.

GitHub Actions is enough for lightweight data collection

A permanent server is not always the right tradeoff. For this workload, GitHub Actions provides scheduled execution, manual replays through workflow_dispatch, centralized secrets, and run logs without maintaining a VM, container platform, or always-on worker.

The schedule design is practical: small, recent windows run more often, while larger or heavier syncs run daily. This lowers provider rate-limit pressure and reduces the chance of workflow overlap. It also keeps the operational model easy to audit because each source has its own workflow and run history.

API-specific collection choices

Raw ingestion does not mean all sources are treated identically. Hablla attendants are collected day by day because summary endpoints can aggregate by period. Zoho has full and recent jobs so the system can refresh recent records frequently without rerunning heavy windows all day. Zenvia call data often makes more sense as a daily close-of-day collection. SIGE billing data is also collected day by day for predictable ERP windows.

Those choices are architecture decisions, not implementation trivia. The worker should respect the semantics of each API while still producing a consistent raw storage envelope.

Downstream SQL stays separate from ingestion

The most valuable SQL work usually happens after raw storage: joins between Hablla cards and persons, derivation of contact timelines, billing views from SIGE, lead attribution from Zoho, or operational telephony metrics from Zenvia. Keeping those transformations downstream means each SQL model can be versioned, tested, and replaced without weakening collection reliability.

This also makes the project a natural continuation of the older integration-worker pattern shown in ETL Pipelines in API Integration Workers. The difference is that Raw API Ingestion Pipeline moves the durable storage boundary from Google Sheets to Supabase raw tables.

Security: useful logs without leaking payloads

Public or semi-public workflow logs should never contain tokens, full API responses, complete error bodies, or raw payload dumps. Raw API Ingestion Pipeline keeps logs operational: status code, summarized error message, provider context, and run outcome. That is enough to diagnose most failures without exposing sensitive customer or credential data.

Why the architecture scales cleanly

Adding another integration follows a repeatable path: create a worker, define the destination raw table, choose the collection window, preserve the raw payload, define a stable external ID, add secrets, create a workflow, and sanitize logs. The architecture scales because each source has local complexity, but the storage contract remains consistent.