Corporate Data Security at Risk From ‘Shadow AI’ Accounts

The growing use of artificial intelligence in the workplace is fueling a rapid increase in data consumption, challenging the corporate ability to safeguard sensitive data.

A report released in May from data security firm Cyberhaven, titled “The Cubicle Culprits,” sheds light on AI adoption trends and their correlation to heightened risk. Cyberhaven’s analysis drew on a dataset of usage patterns from three million workers to assess AI adoption and its implications in the corporate environment.

The rapid rise of AI mimics previous transformative shifts, such as the internet and cloud computing. Just as early cloud adopters navigated new challenges, today’s companies must contend with the complexities introduced by widespread AI adoption, according to Cyberhaven CEO Howard Ting.

“Our research on AI usage and risks not only highlights the impact of these technologies but also underscores the emerging risks that could parallel those encountered during significant technological upheavals in the past,” he told TechNewsWorld.

Findings Suggest Alarm Over Potential for AI Abuses

The Cubicle Culprits report reveals the rapid acceleration of AI adoption in the workplace and use by end users that outpaces corporate IT. This trend, in turn, fuels risky “shadow AI” accounts, including more types of sensitive company data.

Products from three AI tech giants — OpenAI, Google, and Microsoft — dominate AI usage. Their products account for 96% of AI usage at work.

According to the research, workers worldwide entered sensitive corporate data into AI tools, increasing by an alarming 485% from March 2023 to March 2024. We are still early in the adoption curve. Only 4.7% of employees at financial firms, 2.8% in pharma and life sciences, and 0.6% at manufacturing firms use AI tools.

A significant 73.8% of ChatGPT usage at work occurs through non-corporate accounts. Unlike enterprise versions, these accounts incorporate shared data into public models, posing a considerable risk to sensitive data security,” warned Ting.

“A substantial portion of sensitive corporate data is being sent to non-corporate accounts. This includes roughly half of the source code [50.8%], research and development materials [55.3%], and HR and employee records [49.0%],” he said.

Data shared through these non-corporate accounts are incorporated into public models. The percentage of non-corporate account usage is even higher for Gemini (94.4%) and Bard (95.9%).

AI Data Hemorrhaging Uncontrollably

This trend indicates a critical vulnerability. Ting said that non-corporate accounts lack the robust security measures to protect such data.

AI adoption rates are rapidly reaching new departments and use cases involving sensitive data. Some 27% of data that employees put into AI tools is sensitive, up from 10.7% a year ago.

For example, 82.8% of legal documents employees put into AI tools went to non-corporate accounts, potentially exposing the information publicly.

Ting cautioned that including patented material in content generated by AI tools poses increasing risks. Source code insertions generated by AI outside of coding tools can create the risk of vulnerabilities.

Some companies are clueless about stopping the flow of unauthorized and sensitive data exported to AI tools beyond IT’s reach. They rely on existing data security tools that only scan the data’s content to identify its type.

“What’s been missing is the context of where the data came from, who interacted with it, and where it was stored. Consider the example of an employee pasting code into a personal AI account to help debug it,” offered Ting. “Is it source code from a repository? Is it customer data from a SaaS application?”

Controlling Data Flow Is Possible

Educating workers about the data leakage problem is a viable part of the solution if done correctly, assured Ting. Most companies have rolled out periodic security awareness training.

“However, the videos workers have to watch twice a year are soon forgotten. The education that works best is correcting bad behavior immediately in the moment,” he offered.

Cyberhaven found that when workers receive a popup message coaching them during risky activities, like pasting source code into a personal ChatGPT account, ongoing bad behavior decreases by 90%,” said Ting.

His company’s technology, Data Detection and Response (DDR) understands how data moves and uses that context to protect sensitive data. The technology also understands the difference between a corporate and personal account for ChatGPT.

This capability enables companies to enforce a policy that blocks employees from pasting sensitive data into personal accounts while allowing that data to flow to enterprise accounts.

Surprising Twist in Who’s at Fault

Cyberhaven analyzed the prevalence of insider risks based on workplace arrangements, including remote, onsite, and hybrid. Researchers found that a worker’s location impacts the data spread when a security incident occurs.

“Our research uncovered a surprising twist in the narrative. In-office employees, traditionally considered the safest bet, are now leading the charge in corporate data exfiltration,” he revealed.

Counterintuitively, office-based workers are 77% more likely than their remote counterparts to exfiltrate sensitive data. However, when office-based workers log in from offsite, they are 510% more likely to exfiltrate data than when onsite, making this the riskiest time for corporate data, according to Ting.

Ideas and Discoveries

Corporate Data Security at Risk From ‘Shadow AI’ Accounts