PDPA compliance for AI workloads: a practical checklist — Bijak Cloud

The Personal Data Protection Act 2010 (PDPA) was drafted before foundation models were on most product roadmaps. Yet the seven principles it sets out map cleanly onto AI workloads — if you know where to look. This post walks through each principle and shows how it applies to training data, inference logs, and embeddings.

The seven principles

PDPA is built on seven principles. They are short on detail and long on obligation, which is exactly what makes them durable across new technology.

1. General Principle

Personal data must be processed lawfully and in a manner that respects the privacy of the data subject. For AI, “lawfully” means you have a documented basis — consent, contract, legitimate interest — and “respect” means the model cannot reconstruct or infer personal data it was not given.

2. Notice and Choice Principle

The data subject must be told what is collected, why, and what choices they have. AI systems must surface model cards, data-source disclosures, and opt-out pathways.

3. Disclosure Principle

Personal data must not be disclosed without consent or another lawful basis. For AI, this means every outbound call — to a remote inference endpoint, a third-party embedding service, or a logging pipeline — counts as a disclosure.

4. Security Principle

Personal data must be protected by reasonable security safeguards. AI workloads require encryption at rest, in transit, and during compute; HSM-backed key management; and immutable audit logs.

5. Retention Principle

Personal data must be deleted once the purpose for which it was collected is fulfilled. AI inference logs and embeddings must have a documented retention window and a deletion path.

6. Data Integrity Principle

Personal data must be accurate, complete, and not misleading. For AI this means your training datasets, label hierarchies, and human-feedback pipelines need version control and provenance.

7. Access Principle

Data subjects must be able to access and correct their personal data. AI systems must support data-subject access requests (DSARs) including model-derived inferences about them.

Where AI workloads introduce new risk

Three surfaces create the bulk of PDPA exposure in modern AI stacks:

Training data. Most enterprise AI is fine-tuned on internal documents — support tickets, emails, customer notes. These almost always contain personal data. Without a documented legal basis, you cannot train on them.

Inference logs. Every prompt and completion is a disclosure event. If you log them to a remote observability tool, that tool becomes a sub-processor and inherits PDPA obligations.

Embeddings. Vector embeddings can be inverted to recover significant portions of the original text. Treating embeddings as anonymous data is a common mistake — they are personal data when derived from personal data.

A deployment checklist

Before you ship a customer-facing AI workload in Malaysia, walk this list:

Document the legal basis for every training source. Consent records must be auditable.
Map every outbound call. If a request leaves Malaysian infrastructure, you have a disclosure event and a sub-processor relationship.
Encrypt end-to-end. Keys must be HSM-managed and never exported. Logs and embeddings must inherit the same encryption posture.
Set retention windows. Inference logs and embeddings should auto-delete after the documented purpose expires.
Provide DSAR support. Build the tooling to find and delete every trace of a data subject — including embeddings and derived inferences.
Run a PDPA-aligned audit. Review logs, retention, encryption, and access controls against the seven principles and produce a signed report.
Disclose subprocessors. Maintain a public sub-processor list and notify customers of changes.

How Bijak Cloud maps to PDPA

Every layer of Bijak Cloud is engineered against the seven principles. Training data stays in your tenant; inference endpoints run inside Malaysian data centres; embeddings are encrypted with HSM-managed keys and have configurable retention windows; audit logs are immutable and exportable; and DSAR tooling is built into the platform. When regulators ask, you produce one signed report, not fifty screenshots.

Read the Sovereign AI compliance guide →