Architecture overview - Bijak Cloud Docs

Regions

Bijak Cloud operates two Malaysian regions, with a third on the 2027 roadmap.

Region	Location	Status	Latency from KL
`my-cyberjaya`	Cyberjaya, Selangor	Active	<10ms
`my-iskandar`	Iskandar Puteri, Johor	Active	<30ms
`my-sabah`	Kota Kinabalu, Sabah	Roadmap	TBD

Every region runs the full platform stack — compute, storage, inference, RAG, and audit. There is no separation between “inference region” and “storage region” — both live inside the same data centre to preserve data residency.

Networking

Each region is a single fault-isolated zone with three physically separate network planes:

Customer plane — public API endpoints, TLS 1.3, Anycast-routed.
Internal plane — service-to-service communication, mesh-encrypted with WireGuard.
Management plane — operator access via SSO + hardware key, never exposed to customer workloads.

Cross-region replication is opt-in per dataset. The default is no replication — data stays in the region where it was written.

Identity

Identity is built on a single tenant model with role-based access:

Admin — full control over the workspace, including billing and access management.
Developer — read/write access to compute, inference, and RAG resources.
Auditor — read-only access to audit logs and configuration, no data plane access.

SSO is supported via SAML 2.0 and OIDC. Service accounts support key rotation without downtime. Audit logs record every authentication event.

Storage layout

Customer data is stored across three tiers:

Hot — SSD-backed, encrypted with the workspace’s HSM-managed key. Default for inference logs and active RAG corpora.
Warm — object storage, encrypted with the same key, retrieval within minutes. Default for backup snapshots.
Cold — archival storage, encrypted with a rotated key, retrieval within hours. Default for logs past the retention window.

Deletion cascades through all tiers. Cold-tier deletion includes cryptographic shreding of the per-record encryption keys.

Compute

GPU compute is provisioned from a pool of H100 and A100 instances. Each instance is dedicated to a single customer — there is no oversubscription. Compute is region-pinned; an instance launched in Cyberjaya stays in Cyberjaya for its entire lifetime.

Inference endpoints run on a managed fleet with autoscaling based on queue depth and latency. Endpoint configuration (model, region, scaling bounds) is exposed via the API and dashboard.

Observability

Every customer has read-only access to their platform metrics: inference latency, GPU utilisation, RAG query rates, error budgets. Metrics are exportable in OpenTelemetry format to any compatible backend.

Audit logs are append-only and signed. Logs are queryable in the dashboard and exportable as JSON or CSV.

Next steps

Read Concepts: Sovereignty for the compliance foundation.
Review the PDPA compliance guide for control mappings.
Contact us for a custom architecture review.