Files
sankofa-hw-infra/docs/offer-ingestion.md
defiQUG 93df3c8c20
Some checks failed
CI / lint-and-test (push) Has been cancelled
Initial commit: add .gitignore and README
2026-02-09 21:51:50 -08:00

4.6 KiB
Raw Permalink Blame History

Offer ingestion (scrape and email)

Offers can be ingested from external sources so they appear in the database for potential purchases, without manual data entry.

Sources

  1. Scraped e.g. site content from theserverstore.com (Peter as Manager). A scraper job fetches pages, parses offer-like content, and creates offer records.
  2. Email a dedicated mailbox accepts messages (e.g. from Sergio and others); a pipeline parses them and creates offer records.

Ingested offers are stored with:

  • source: scraped or email
  • source_ref: URL (scrape) or email message id (email)
  • source_metadata: optional JSON (e.g. sender, subject, page title, contact name)
  • ingested_at: timestamp of ingestion
  • vendor_id: optional; may be null until procurement assigns the offer to a vendor

API: ingestion endpoint

Internal or automated callers use a dedicated endpoint, secured by an API key (no user JWT).

POST /api/v1/ingestion/offers

  • Auth: Header x-ingestion-api-key must equal the environment variable INGESTION_API_KEY. If missing or wrong, returns 401.
  • Org: Header x-org-id (default default) specifies the org for the new offer.

Body (JSON):

Field Type Required Description
source "scraped" | "email" yes Ingestion source
source_ref string no URL or message id
source_metadata object no e.g. { "sender": "Sergio", "subject": "...", "page_url": "..." }
vendor_id UUID no Vendor to attach; omit for unassigned
sku string no
mpn string no
quantity number yes
unit_price string yes Decimal
incoterms string no
lead_time_days number no
country_of_origin string no
condition string no
warranty string no
evidence_refs array no [{ "key": "s3-key", "hash": "..." }]

Response: 201 with the created offer (including id, source, source_ref, source_metadata, ingested_at).

Example (scrape):

{
  "source": "scraped",
  "source_ref": "https://theserverstore.com/...",
  "source_metadata": { "contact": "Peter", "site": "theserverstore.com" },
  "vendor_id": null,
  "sku": "DL380-G9",
  "quantity": 2,
  "unit_price": "450.00",
  "condition": "refurbished"
}

Example (email):

{
  "source": "email",
  "source_ref": "msg-12345",
  "source_metadata": { "from": "sergio@example.com", "subject": "Quote for R630" },
  "vendor_id": null,
  "mpn": "PowerEdge R630",
  "quantity": 1,
  "unit_price": "320.00"
}

Scraper (e.g. theserverstore.com)

  • Responsibility: Fetch pages (respecting robots.txt and rate limits), extract product/offer fields, then POST to POST /api/v1/ingestion/offers for each offer.
  • Where: Can run as a scheduled job in apps/ or packages/, or as an external service that calls the API. No scraper implementation is in-repo yet; this doc defines the contract.
  • Vendor: If the site is known (e.g. The Server Store, Peter as Manager), the scraper can resolve or create a vendor and pass vendor_id; otherwise leave null for procurement to assign later.
  • Idempotency: Use source_ref (e.g. canonical product URL) so the same offer is not duplicated; downstream you can upsert by (org_id, source, source_ref) if desired.

Email intake (e.g. Sergio and others)

  • Flow: Incoming messages to a dedicated mailbox (e.g. offers@your-org.com) are read by an IMAP poller or processed via an inbound webhook (SendGrid, Mailgun, etc.). The pipeline parses sender, subject, body, and optional attachments, then POSTs one or more payloads to POST /api/v1/ingestion/offers.
  • Storing raw email: Attachments or full message can be uploaded to object storage (e.g. S3/MinIO) and referenced in evidence_refs or source_metadata (e.g. raw_message_key).
  • Vendor matching: Match sender address or name to an existing vendor and set vendor_id when possible; otherwise leave null and set source_metadata.sender / from for later assignment.

Configuration

  • Set INGESTION_API_KEY in the environment where the API runs. Scraper and email pipeline must use the same value in x-ingestion-api-key.
  • Use x-org-id on each request to target the correct org.

Procurement workflow

  • Ingested offers appear in the offers list with source = scraped or email and optional vendor_id.
  • Offers with vendor_id null are “unassigned”; procurement can assign them to a vendor (PATCH offer or create/link vendor then update offer).
  • Existing RBAC and org/site scoping apply; audit can track creation via ingested_at and source_metadata.