Zoom‘s plans earlier this year to use customer data and call recordings to train its artificial intelligence services sparked outrage among privacy advocates. The video conferencing company quickly backtracked and updated its privacy policy to allow users to opt out of having their data used for AI development. But given Zoom‘s history of misleading claims and selling user information without permission, should we trust this policy shift?
The rise of powerful AI systems like ChatGPT has highlighted how much data is needed to develop and refine these technologies. According to OpenAI, ChatGPT was trained on a dataset totaling 570 gigabytes of text data, over 1 trillion words. Google‘s LaMDA model was trained on 1.56 trillion parameters. The computational power and data needed for advanced AI is doubling every few months by some estimates.
Tech giants like Meta, Google, Apple and others are racing to collect more customer data to remain competitive in AI development. Meta‘s apps alone collect 98 data points on average per user. Apple receives up to a billion Siri requests per month to train its AI. Smaller companies like Zoom are at a disadvantage in amassing huge datasets from users.
Contents
Zoom‘s Rocky Relationship with User Privacy
Zoom‘s attempt earlier this year to claim broad rights to customer data for AI training reminded many of the company‘s troubled history around privacy and transparency.
In 2020, Zoom was forced to admit its video calls were not end-to-end encrypted as claimed. The company ended up settling a class action lawsuit for $85 million over sharing user data with third parties like Facebook without permission between 2016-2021.
Legal experts noted Zoom‘s surreptitious data sharing practices very likely violated wiretapping laws prohibiting unauthorized recordings of private communications.
Given this track record, it‘s understandable why Zoom announcing in March 2022 it could use customer call data for "perpetual, royalty-free licensing" triggered immediate backlash. Online privacy advocates quickly spotlighted clauses allowing Zoom to access transcripts, polls, whiteboard content and recordings for AI model development.
After public criticism, Zoom updated its terms of service in September 2022 to allow opting out of AI data usage. But some privacy experts remain skeptical. Will Zoom uphold this policy if large numbers of users opt out, impacting its AI development? Could subtle future changes nullify the opt-out down the line? Zoom‘s contradictory language around still using data to improve its own products and services raises questions. Ongoing vigilance over privacy policies is needed.
Opaque Data Practices Still the Norm
U.S. privacy laws remain relatively limited compared to regions like Europe, and federal legislation has not kept pace with AI technologies‘ voracious data needs. Outside of sector-specific regulations like HIPAA for healthcare, companies currently have very broad latitude over collecting and monetizing consumer data for purposes like AI development. Ethics are often an afterthought.
For example, sites like Reddit have allowed AI researchers to freely scrape massive volumes of public user content for model training, despite questions around permissions and copyright.
Legal experts note that U.S. wiretapping laws may prohibit unauthorized recordings of phone or video conversations from being used to train AI without consent. But companies can argue derived data like transcripts fall into more of a legal gray area.
While some U.S. lawmakers have proposed updating the Electronic Communications Privacy Act to boost digital privacy rights, comprehensive federal AI regulation still seems far off. Unless citizens demand change, access to the huge troves of data AI systems require will likely continue to be an unregulated free-for-all.
Emerging Alternatives to Customer Data
However, researchers are exploring methods like differential privacy, federated learning and synthetic data generation that could reduce reliance on customer data to develop AI models.
Differential privacy adds controlled "noise" to datasets to obscure identifying details. Federated learning allows models to be trained across decentralized devices without aggregating user data. And synthetic data uses algorithms to automatically generate artificial datasets for training.
While still nascent, techniques like these could enable AI advancement with less compromise of customer privacy. However, most companies lack incentive to adopt them without regulatory mandates.
How Can You Better Protect Your Privacy?
Until stronger legal protections exist, individuals need to take steps to minimize how much of their data is harvested by AI development pipelines. Some best practices include:
-
Be extremely wary of "free" services or apps collecting your info. Read privacy policies carefully.
-
Use private/encrypted browsers like Brave and ditch Chrome or Safari which have invasive data collection.
-
Disable microphone/camera access for apps when not in use. Limit personal info given to AI bots.
-
Switch to secure messaging apps like Signal or Telegram instead of Meta-owned WhatsApp or Messenger.
-
Use a VPN on all your devices to encrypt traffic and mask your IP address from trackers.
-
Disable location services and limit ad tracking permissions for apps. Use a firewall app.
-
Opt out of data collection where possible. Be selective in sharing personal data online.
-
For businesses, conduct conferencing platform risk assessments regularly and train staff on privacy hygiene.
The onus remains heavily on individuals and organizations to protect privacy in the absence of robust regulations on AI data gathering. While Zoom‘s policy shift shows public pressure can instigate change, continued vigilance is required. Companies will not self-regulate or prioritize ethics without strong external forces demanding it.