Data Engineering

Raw API Ingestion Pipeline

Node.js workers scheduled by GitHub Actions collect raw operational data from Hablla, Zoho Creator, Zenvia Voice, and SIGE ERP APIs, then persist reprocessable payloads into Supabase SQL raw tables.

Problem Context, APIs, SQL Storage, and Architecture

Raw API Ingestion Pipeline solves a data engineering problem that appears in many operational environments: third-party APIs are the source of truth, but analytics and reporting need a reliable raw layer that can be replayed, audited, and modeled later. The project keeps ingestion separate from business modeling so API payloads remain available even when SQL transformations, BI dashboards, and operational rules change.

RuntimeNode.js 20 workers executed locally or by GitHub Actions.
StorageSupabase/PostgreSQL raw_* tables with raw payload retention.
APIsHablla persons/cards/services, Zoho Creator leads/scheduling, Zenvia Voice, SIGE ERP billing.
ReliabilityIdempotent external_id values, replayable windows, workflow_dispatch, sanitized logs.

Technical Scope

  • Stack: JavaScript, Node.js 20, GitHub Actions, Supabase, PostgreSQL, SQL, REST APIs, OAuth, cron scheduling.
  • System Type: raw data ingestion pipeline, API integration system, analytics staging layer, backend automation workflow.
  • SEO context: Supabase SQL architecture, raw JSON payload storage, ETL pipeline design, API data extraction, idempotent ingestion, CRM and ERP API integration.

Related work: Zoho Integration Worker, Hablla Integration Worker, Zenvia Integration Worker, and SIGE Integration Worker.

Architecture at a Glance

GitHub Actions / Local Runner -> Node.js ingestion workers -> Hablla API | Zoho Creator API | Zenvia Voice API | SIGE ERP API -> Supabase PostgreSQL raw_* tables -> SQL-derived layers, analytics, BI, and operational reporting
Hablla ClientsPersons endpoint to raw_contact_hablla with client-{id} external IDs.
Hablla CardsCards endpoint to raw_events_hablla with replayable day windows.
Hablla AttendantsDaily services summary to raw_cs_avaliacao_atendimento to avoid period aggregation errors.
Zenvia CallsVoice call reports to raw_contact_telefonia for operational telephony analytics.
SIGE FaturamentoBilled orders to raw_events_faturado with pedido-{Codigo} external IDs.
Zoho LeadsCreator lead records to raw_contact_site with full and recent sync modes.
Zoho SchedulingScheduling records to raw_events_agendamento with month and recent windows.

Project Documentation

Raw API Ingestion Pipeline

Node 20 GitHub Actions Supabase

Raw API Ingestion Pipeline is a raw ingestion project for external operational APIs. It runs scheduled API calls, keeps transformation intentionally thin at the ingestion boundary, and stores the original API response inside Supabase raw tables for later SQL modeling, analytics, and BI layers.

Objective

The main goal is to decouple API collection from analytical modeling. Instead of writing directly to a dashboard-ready table or spreadsheet, each worker stores the raw payload with an idempotent identifier, a destination table, and enough metadata to support reprocessing.

  • Collect third-party API data on a schedule.
  • Persist reliable raw records in Supabase/PostgreSQL.
  • Keep historical payloads reprocessable without another external API call.
  • Let SQL, BI, and operational models evolve outside the ingestion layer.

Architectural Principles

  • The saved payload remains raw.
  • The ingestion envelope can vary: external_id, destination table, collection window, logs, and scheduling.
  • The raw table mirrors the source API. It is not the business model.
  • Logs expose operational metadata, not secrets or full sensitive payloads.

Why GitHub Actions and Supabase

For lightweight ingestion, GitHub Actions plus Supabase removes the need for a permanent worker or VM. Scheduled workflows provide auditability, manual replay with workflow_dispatch, centralized GitHub Secrets, and simple cost control while Supabase acts as a durable SQL storage layer for raw records.

SQL and Raw Data Strategy

The important SQL design decision is to store raw API outputs first and derive normalized models later. Tables such as raw_contact_hablla, raw_events_hablla, raw_contact_telefonia, raw_events_faturado, raw_contact_site, and raw_events_agendamento can support downstream SQL views, materialized tables, joins, and BI-friendly datasets without losing the original provider shape.

The external_id field makes repeated collection windows safe: the same five, seven, fifteen, or monthly period can be replayed without duplicating rows when upsert logic is applied consistently.

Current Integrations

  • Hablla Clients: persons endpoint to raw_contact_hablla, using client-{id}.
  • Hablla Cards: cards endpoint to raw_events_hablla, with a default recent window.
  • Hablla Attendants: services summary to raw_cs_avaliacao_atendimento, collected day by day.
  • Zenvia Calls: voice call reports to raw_contact_telefonia, generally suited to daily close-of-day runs.
  • SIGE Faturamento: billed orders to raw_events_faturado, using pedido-{Codigo}.
  • Zoho Leads Full and Recent: Creator lead reports to raw_contact_site.
  • Zoho Scheduling Full and Recent: scheduling reports to raw_events_agendamento.

Scheduling Model

The project mixes frequent short windows with less frequent larger windows. Hablla clients/cards and recent Zoho jobs run several times per day. Zenvia, SIGE, Hablla attendants, and full Zoho loads run daily. This reduces API pressure, keeps fresh operational data available, and lowers the risk of workflow overlap.

Security and Logs

Because workflows can expose public run metadata, error logs are sanitized. The project avoids logging tokens, secrets, complete raw payloads, and full provider error responses. Logs focus on status codes, summarized messages, and operational context that can be debugged without leaking credentials.

Migration From Google Sheets

The older pattern synchronized API data directly into Google Sheets after early mapping and filtering. Raw API Ingestion Pipeline moves the storage boundary earlier: raw API payloads are captured in Supabase first, then SQL modeling and analytics happen downstream. This improves auditability, replayability, and resilience against changing business rules.

Technology Stack

Node.js 20
JavaScript
GitHub Actions
Supabase
PostgreSQL / SQL
REST APIs
OAuth
Cron Workflows

View the Source Code

View on GitHub