Lakehouse and warehouse architecture on open formats
Storage architected on open table formats — Iceberg, Delta, or Hudi — with compute engines layered on top. Snowflake, Databricks, BigQuery, Synapse, Redshift, and Microsoft Fabric are deployed where they fit the workload, not where vendor relationships push them. Bronze, silver, and gold zoning is implemented as a catalog discipline, not a folder convention. Storage is decoupled from compute, raw history is preserved immutably, and compute can be swapped without re-platforming the data.
Feature stores for machine learning
A first-class feature serving layer with batch-online parity — Feast, Tecton, Databricks Feature Store, or native warehouse implementations. Features are versioned, tested, monitored for drift, and shared across training and inference paths. The feature store is the contract between data engineering and ML — not a copy of the warehouse for one team's convenience.
Vector stores and semantic indexes for generative AI
Embedding stores that operate as a governed platform component, not as a side project. pgvector, Qdrant, Milvus, Weaviate, and the native vector capabilities now appearing inside warehouses and lakehouses are deployed against the workload — with embedding versioning, refresh schedules, source lineage, and access policies wired into the same governance plane as the structured data.
Real-time analytical engines
Sub-second analytical engines (ClickHouse, Pinot, Druid) and streaming materialised views (Materialize, RisingWave) integrated as serving tiers for workloads where the latency budget excludes a warehouse round-trip. Real-time serving inherits the same semantic definitions and quality contracts as the rest of the platform — so operational dashboards and analytical reports do not contradict each other.
Catalog, semantic layer, and governance plane
A unified catalog — Unity Catalog, Polaris, AWS Glue, Atlan, Collibra, Microsoft Purview — that owns lineage, classification, ownership, retention, and access decisions across every storage tier. A semantic layer (dbt Semantic Layer, Cube, AtScale, native warehouse semantic models) that ensures the same metric definition is reused by BI tools, ML pipelines, and AI agents alike. Governance metadata is produced by the platform; it is not assembled manually by a stewardship team.
Reference architectures matched to operating model
Centralised lakehouse, hub-and-spoke warehouse, data mesh, data fabric, or hybrid — chosen against the organisation's federation reality, regulatory geometry, and engineering maturity. Hatsya does not impose data mesh on a centralised organisation, and does not impose a centralised lakehouse on a federated one. The architecture is engineered to the operating model, not against it.