Raw API Ingestion Pipeline

Raw API Ingestion Pipeline is a raw ingestion project for external operational APIs. It runs scheduled API calls, keeps transformation intentionally thin at the ingestion boundary, and stores the original API response inside Supabase raw tables for later SQL modeling, analytics, and BI layers.

Objective

The main goal is to decouple API collection from analytical modeling. Instead of writing directly to a dashboard-ready table or spreadsheet, each worker stores the raw payload with an idempotent identifier, a destination table, and enough metadata to support reprocessing.

Collect third-party API data on a schedule.
Persist reliable raw records in Supabase/PostgreSQL.
Keep historical payloads reprocessable without another external API call.
Let SQL, BI, and operational models evolve outside the ingestion layer.

Architectural Principles

The saved payload remains raw.
The ingestion envelope can vary: external_id, destination table, collection window, logs, and scheduling.
The raw table mirrors the source API. It is not the business model.
Logs expose operational metadata, not secrets or full sensitive payloads.

Why GitHub Actions and Supabase

For lightweight ingestion, GitHub Actions plus Supabase removes the need for a permanent worker or VM. Scheduled workflows provide auditability, manual replay with workflow_dispatch, centralized GitHub Secrets, and simple cost control while Supabase acts as a durable SQL storage layer for raw records.

SQL and Raw Data Strategy

The important SQL design decision is to store raw API outputs first and derive normalized models later. Tables such as raw_contact_hablla, raw_events_hablla, raw_contact_telefonia, raw_events_faturado, raw_contact_site, and raw_events_agendamento can support downstream SQL views, materialized tables, joins, and BI-friendly datasets without losing the original provider shape.

The external_id field makes repeated collection windows safe: the same five, seven, fifteen, or monthly period can be replayed without duplicating rows when upsert logic is applied consistently.

Current Integrations

Hablla Clients: persons endpoint to raw_contact_hablla, using client-{id}.
Hablla Cards: cards endpoint to raw_events_hablla, with a default recent window.
Hablla Attendants: services summary to raw_cs_avaliacao_atendimento, collected day by day.
Zenvia Calls: voice call reports to raw_contact_telefonia, generally suited to daily close-of-day runs.
SIGE Faturamento: billed orders to raw_events_faturado, using pedido-{Codigo}.
Zoho Leads Full and Recent: Creator lead reports to raw_contact_site.
Zoho Scheduling Full and Recent: scheduling reports to raw_events_agendamento.

Scheduling Model

The project mixes frequent short windows with less frequent larger windows. Hablla clients/cards and recent Zoho jobs run several times per day. Zenvia, SIGE, Hablla attendants, and full Zoho loads run daily. This reduces API pressure, keeps fresh operational data available, and lowers the risk of workflow overlap.

Security and Logs

Because workflows can expose public run metadata, error logs are sanitized. The project avoids logging tokens, secrets, complete raw payloads, and full provider error responses. Logs focus on status codes, summarized messages, and operational context that can be debugged without leaking credentials.

Migration From Google Sheets

The older pattern synchronized API data directly into Google Sheets after early mapping and filtering. Raw API Ingestion Pipeline moves the storage boundary earlier: raw API payloads are captured in Supabase first, then SQL modeling and analytics happen downstream. This improves auditability, replayability, and resilience against changing business rules.

Raw API Ingestion Pipeline

Problem Context, APIs, SQL Storage, and Architecture

Technical Scope

Architecture at a Glance

Project Documentation