The news broke this week that Samsung semiconductor employees leaked confidential data through ChatGPT on at least three separate occasions within a span of just 20 days. Engineers pasted proprietary source code to get debugging help, shared internal meeting notes for summarization, and submitted semiconductor test data for analysis. Samsung has since restricted internal ChatGPT use and is reportedly developing its own in-house AI tool. This story encapsulates a problem that every organization using — or trying to prevent the use of — LLM services needs to confront head-on.
What Actually Happened#
According to Korean media outlet Economist Korea, the leaks occurred shortly after Samsung’s semiconductor division lifted an earlier ban on ChatGPT. Three incidents were reported:
- An engineer pasted proprietary source code into ChatGPT to check for bugs
- Another employee submitted code related to semiconductor equipment to optimize it
- A third recorded an internal meeting, transcribed it, and fed it to ChatGPT for meeting minutes
In each case, the employees were using ChatGPT as a productivity tool — the same way millions of developers and knowledge workers are using it right now. The problem is that anything submitted to ChatGPT through the standard interface can be used by OpenAI for model training, and there’s no mechanism to retrieve or delete that data after submission.
Samsung’s response was swift: restrict ChatGPT usage to prompts under 1,024 bytes (effectively making it useless for code tasks), threaten disciplinary action for future violations, and accelerate development of an internal alternative.
The Systemic Problem#
Samsung isn’t unique here. They’re just the first major company to have their internal AI mishap become public. I’d wager that similar incidents are happening at thousands of companies right now, most of them undetected.
The fundamental issue is a collision between two forces:
Developer productivity gains are real. ChatGPT and similar tools genuinely make people more productive. The Samsung engineers weren’t being negligent for fun — they were trying to do their jobs better and faster. When you’re staring at a bug at 11 PM, the temptation to paste your code into the best debugging assistant ever created is enormous.
Data governance hasn’t caught up. Most organizations’ data classification and handling policies were written for a world where the primary risks were email attachments and USB drives. They don’t account for a scenario where an employee can exfiltrate sensitive data with a browser tab and good intentions.
This creates a shadow IT problem of unprecedented scale. Even if your company has an official policy prohibiting ChatGPT use, how do you enforce it? The service runs in a browser. There’s no installable client to block. You can restrict the domain at the network level, but employees have phones. And if you’re too heavy-handed with restrictions, your engineers will just use it at home on their personal devices — with less oversight, not more.
Building an Enterprise AI Policy That Works#
Based on conversations I’ve been having with CISOs and engineering leaders, here’s what a pragmatic approach looks like:
Classify your data explicitly. Engineers need to know, in concrete terms, what can and cannot be shared with external AI services. “Confidential data” is too vague. Define it: source code from repositories X, Y, Z — never. Internal documentation — never. Public API documentation — acceptable. Stack traces with identifiers stripped — acceptable with review.
Provide sanctioned alternatives. Banning ChatGPT without providing an alternative is like banning Stack Overflow — people will find workarounds. The better approach is to offer approved tools with proper data handling. OpenAI’s API with the data usage opt-out, Azure OpenAI Service with enterprise data protection, or self-hosted models like those based on LLaMA are all viable options depending on your sensitivity requirements.
Implement technical controls where possible. DLP (Data Loss Prevention) tools can be configured to flag or block submissions to known AI service domains. Browser extensions can intercept paste events on certain sites. These aren’t foolproof, but they add friction that reduces accidental exposure.
Train your people. The Samsung engineers likely had no idea their prompts could be used for training. A 30-minute security awareness session specifically about AI tool risks would have prevented all three incidents.
The Broader Implications#
This incident is accelerating a trend I’ve been watching: the enterprise AI stack is going to look very different from the consumer AI stack. Companies with serious IP concerns — semiconductor, pharma, defense, finance — are going to demand:
- On-premises or VPC-deployed models where data never leaves their infrastructure
- Contractual guarantees that prompt data isn’t used for training
- Audit trails showing what data was submitted and by whom
- Model isolation ensuring their fine-tuned models aren’t accessible to other customers
OpenAI’s enterprise offerings, Azure OpenAI Service, and the emerging open-source model ecosystem are all responses to this demand. But we’re still in the early days of figuring out the right architecture and governance model.
My Take#
I have sympathy for those Samsung engineers. They did what any curious, productivity-minded developer would do — they used the best tool available to solve their immediate problem. The failure isn’t individual; it’s organizational. If your security policy can be violated by a well-meaning employee using a browser, your policy is insufficient.
The answer isn’t to ban AI tools. That ship has sailed. The answer is to build infrastructure and policies that let your team use AI productively without putting your IP at risk. That means investing in self-hosted models, deploying enterprise-grade AI services with proper data handling, and treating AI governance as a first-class security concern — not an afterthought.
Every engineering leader should be asking right now: “If my team is using ChatGPT — and they probably are — what data have they already shared?” The answer might be uncomfortable, but it’s better to find out on your own terms than to read about it in the press.
This post is part of my Security in Practice series, exploring real-world security challenges in software engineering.
