AI tools collect and store data about you from all your devices, here's how to be aware of what you're revealing

Like it or not, artificial intelligence has become part of daily life. Many devices, including electric razors and toothbrushes, have become "AI-powered," using machine learning algorithms to track how a person uses the device, how the device is working in real time, and provide feedback. From asking questions to an AI assistant like ChatGPT or Microsoft Copilot to monitoring a daily fitness routine with a smartwatch, many people use an AI system or tool every day. While AI tools and technologies can make life easier, they also raise important questions about data privacy. These systems often collect large amounts of data, sometimes without people even realizing their data is being collected. The information can then be used to identify personal habits and preferences, and even predict future behaviours by drawing inferences from the aggregated data.
As an assistant professor of cybersecurity at West Virginia University, I study how emerging technologies and various types of AI systems manage personal data and how we can build more secure, privacy-preserving systems for the future. Generative AI software uses large amounts of training data to create new content such as text or images. Predictive AI uses data to forecast outcomes based on past behaviour, such as how likely you are to hit your daily step goal, or what movies you may want to watch. Both types can be used to gather information about you.
How AI tools collect data
Generative AI assistants such as ChatGPT and Google Gemini collect all the information users type into a chat box. Every question, response and prompt that users enter is recorded, stored and analysed to improve the AI model. OpenAI's privacy policy informs users that "we may use content you provide us to improve our Services, for example to train the models that power ChatGPT." Even though OpenAI allows you to opt out of content use for model training, it still collects and retains your personal data. Although some companies promise that they anonymise this data, meaning they store it without naming the person who provided it, there is always a risk of data being reidentified.
Predictive AI
Beyond generative AI assistants, social media platforms like Facebook, Instagram and TikTok continuously gather data on their users to train predictive AI models. Every post, photo, video, like, share and comment, including the amount of time people spend looking at each of these, is collected as data points that are used to build digital data profiles for each person who uses the service. The profiles can be used to refine the social media platform's AI recommender systems. They can also be sold to data brokers, who sell a person's data to other companies to, for instance, help develop targeted advertisements that align with that person's interests. Many social media companies also track users across websites and applications by putting cookies and embedded tracking pixels on their computers. Cookies are small files that store information about who you are and what you clicked on while browsing a website. One of the most common uses of cookies is in digital shopping carts: When you place an item in your cart, leave the website and return later, the item will still be in your cart because the cookie stored that information. Tracking pixels are invisible images or snippets of code embedded in websites that notify companies of your activity when you visit their page. This helps them track your behaviour across the internet. This is why users often see or hear advertisements that are related to their browsing and shopping habits on many of the unrelated websites they browse, and even when they are using different devices, including computers, phones and smart speakers. One study found that some websites can store over 300 tracking cookies on your computer or mobile phone.
Data privacy controls - and limitations
Like generative AI platforms, social media platforms offer privacy settings and opt-outs, but these give people limited control over how their personal data is aggregated and monetized. As media theorist Douglas Rushkoff argued in 2011, if the service is free, you are the product. Many tools that include AI don't require a person to take any direct action for the tool to collect data about that person. Smart devices such as home speakers, fitness trackers and watches continually gather information through biometric sensors, voice recognition and location tracking. Smart home speakers continually listen for the command to activate or "wake up" the device. As the device is listening for this word, it picks up all the conversations happening around it, even though it does not seem to be active. Some companies claim that voice data is only stored when the wake word - what you say to wake up the device - is detected. However, people have raised concerns about accidental recordings, especially because these devices are often connected to cloud services, which allow voice data to be stored, synced and shared across multiple devices such as your phone, smart speaker and tablet. If the company allows, it's also possible for this data to be accessed by third parties, such as advertisers, data analytics firms or a law enforcement agency with a warrant.
Privacy rollbacks
This potential for third-party access also applies to smartwatches and fitness trackers, which monitor health metrics and user activity patterns. Companies that produce wearable fitness devices are not considered "covered entities" and so are not bound by the Health Information Portability and Accountability Act. This means that they are legally allowed to sell health- and location-related data collected from their users. Concerns about HIPAA data arose in 2018, when Strava, a fitness company released a global heat map of user's exercise routes. In doing so, it accidentally revealed sensitive military locations across the globe through highlighting the exercise routes of military personnel. The Trump administration has tapped Palantir, a company that specializes in using AI for data analytics, to collate and analyse data about Americans. Meanwhile, Palantir has announced a partnership with a company that runs self-checkout systems. Such partnerships can expand corporate and government reach into everyday consumer behaviour. This one could be used to create detailed personal profiles on Americans by linking their consumer habits with other personal data. This raises concerns about increased surveillance and loss of anonymity. It could allow citizens to be tracked and analysed across multiple aspects of their lives without their knowledge or consent.
Some smart device companies are also rolling back privacy protections instead of strengthening them. Amazon recently announced that starting on March 28, 2025, all voice recordings from Amazon Echo devices would be sent to Amazon's cloud by default, and users will no longer have the option to turn this function off. This is different from previous settings, which allowed users to limit private data collection. Changes like these raise concerns about how much control consumers have over their own data when using smart devices. Many privacy experts consider cloud storage of voice recordings a form of data collection, especially when used to improve algorithms or build user profiles, which has implications for data privacy laws designed to protect online privacy.
Implications for data privacy
All of this brings up serious privacy concerns for people and governments on how AI tools collect, store, use and transmit data. The biggest concern is transparency. People don't know what data is being collected, how the data is being used, and who has access to that data. Companies tend to use complicated privacy policies filled with technical jargon to make it difficult for people to understand the terms of a service that they agree to. People also tend not to read terms of service documents. One study found that people averaged 73 seconds reading a terms of service document that had an average read time of 29-32 minutes. Data collected by AI tools may initially reside with a company that you trust, but can easily be sold and given to a company that you don't trust. AI tools, the companies in charge of them and the companies that have access to the data they collect can also be subject to cyberattacks and data breaches that can reveal sensitive personal information. These attacks can by carried out by cybercriminals who are in it for the money, or by so-called advanced persistent threats, which are typically nation/state- sponsored attackers who gain access to networks and systems and remain there undetected, collecting information and personal data to eventually cause disruption or harm.
While laws and regulations such as the General Data Protection Regulation in the European Union and the California Consumer Privacy Act aim to safeguard user data, AI development and use have often outpaced the legislative process. The laws are still catching up on AI and data privacy. For now, you should assume any AI-powered device or platform is collecting data on your inputs, behaviours and patterns.
Using AI tools
Although AI tools collect people's data, and the way this accumulation of data affects people's data privacy is concerning, the tools can also be useful. AI-powered applications can streamline workflows, automate repetitive tasks and provide valuable insights. But it's crucial to approach these tools with awareness and caution.
When using a generative AI platform that gives you answers to questions you type in a prompt, don't include any personally identifiable information, including names, birth dates, Social Security numbers or home addresses. At the workplace, don't include trade secrets or classified information. In general, don't put anything into a prompt that you wouldn't feel comfortable revealing to the public or seeing on a billboard. Remember, once you hit enter on the prompt, you've lost control of that information. Remember that devices which are turned on are always listening - even if they're asleep. If you use smart home or embedded devices, turn them off when you need to have a private conversation. A device that's asleep looks inactive, but it is still powered on and listening for a wake word or signal. Unplugging a device or removing its batteries is a good way of making sure the device is truly off. Finally, be aware of the terms of service and data collection policies of the devices and platforms that you are using. You might be surprised by what you've already agreed to.

Hashtags

#ChatGPT

#MicrosoftCopilot

#GoogleGemini

#OpenAI

#WestVirginiaUniversity

#Facebook

Try Our AI Features

Explore what Daily8 AI can do for you:

Comments

No comments yet...

'They copy-pasted from AI': Tech company offers Rs 20 lakh, can't find a single techie who understands code

Economic Times

an hour ago

Economic Times

'They copy-pasted from AI': Tech company offers Rs 20 lakh, can't find a single techie who understands code

You Might Also Like: AI cannot replace all jobs, says expert: 3 types of careers that could survive the automation era A technology firm recently launched a hiring campaign for a well-compensated entry-level position offering a handsome Rs 20 lakh per annum. However, despite conducting a staggering 450 interviews, the company failed to find a single candidate suitable for the role. The recruitment team turned to the Developers India subreddit to share their ordeal, hoping to shed light on the complexities of hiring in the AI company had posted job listings on LinkedIn for junior-level frontend and backend developers, as well as QA roles. The salary range—up to Rs 20 lakh—attracted a wave of over 12,000 applications. From the outset, the hiring team filtered out nearly 10,000 applicants, citing reasons such as poorly tailored resumes and a lack of relevant technical abilities. According to them, the early elimination was not about being overly selective but about saving both their own time and the applicants' from fruitless interview who did make it to the interviews were tested on fundamental programming principles as well as standard data structures and algorithms topics like trees, heaps, linked lists, and graph traversal methods such as breadth-first search and depth-first search. Interestingly, the firm even permitted the use of tools like ChatGPT during assessments to simulate a real-world working this modern approach backfired. While candidates were quick to churn out working solutions—often copied directly from AI—the problems began when interviewers asked for an explanation. Most were unable to describe what their own code was doing or provide details about its time and space complexity. This led to the realization that many candidates were simply copying and pasting without comprehending the logic behind the code—a phenomenon the recruiter described as "vibe coding."This troubling pattern prompted the company to reflect on its own methods. Was the interview process too rigid or flawed? Or was it indicative of a larger issue where aspiring developers rely too heavily on AI tools, skipping the foundational learning necessary to become competent programmers?The Reddit community didn't hold back in its response. Some users questioned the company's recruitment practices, pointing out that spending 450 hours on interviews without hiring a single person suggested deeper internal problems. One user criticized the process as inefficient and misguided, arguing that the HR team might be more at fault than the candidates themselves. Another suggested that if so many interviews yield zero hires, it could be a sign that the hiring strategy—and not the talent pool—is hiring saga mirrors a broader anxiety echoed by Godfather of AI, Geoffrey Hinton, who recently warned that roles relying heavily on intellectual replication—like coding without understanding—are among the first to be replaced by AI. As companies seek talent that can think, not just code, this recruitment roadblock may signal a growing divide between tool-dependent developers and genuinely skilled professionals.

With ads and paid subscriptions, is WhatsApp Meta's next big bet?

Business Standard

an hour ago

Business Standard

With ads and paid subscriptions, is WhatsApp Meta's next big bet?

After the success of bringing businesses to WhatsApp, Meta — its parent — is now introducing ads and subscriptions to Status and Channels under the Updates tab of the messaging app, the company said on Monday. This move will further push its monetisation efforts on WhatsApp. The Updates tab is now used by 1.5 billion users every day. Users who follow certain Channels will be able to receive exclusive updates for a monthly fee. The Status tab will also begin featuring advertisements from businesses, enabling easier connection with users. This is the first time that businesses will be able to run advertisements directly on WhatsApp. Previously, there were only two ways to target users via WhatsApp: first, through paid messaging used by large businesses to send updates, and second, via ads on Facebook and Instagram that click through to open a WhatsApp chat. In a select media briefing, the company stated that for users who only use WhatsApp to chat with friends and family, there will be no change. To show ads in Status and Channels, the platform will share minimal information such as the user's country or city, language, and the Channels they follow. Alice Newton-Rex, Vice-President, Product at WhatsApp, said the platform will never share or sell users' phone numbers to advertisers. Personal messages, calls and group conversations will continue to be encrypted. 'We are increasingly seeing and hearing people wanting to use WhatsApp for more than just messaging close friends and family. That's part of the reason we introduced the 'Updates' tab as a place for optional experiences on WhatsApp like Channels and Status. Almost 1.5 billion people globally use Updates daily. It's often where people go when they are looking to discover something new,' said Newton-Rex. The new features will roll out gradually over the next few months. The timing of this launch is significant, as WhatsApp has seen its two existing advertisement models scale well — both paid messaging and click-to-WhatsApp ads are now billion-dollar businesses. This is especially relevant for a market like India, where WhatsApp has over 500 million users, and revenue from WhatsApp Business doubled in 2024. Sandhya Devanathan, Head of Meta India and South East Asia, previously told Business Standard that business momentum for WhatsApp in India is 'outpacing' several other countries. An important aspect of the rollout is the focus on preserving individual privacy. At a time when regulators worldwide are scrutinising big tech's use of personal data, WhatsApp maintains that it will share only minimal information with advertisers and that messages will remain encrypted. In response to a question about rising scams and frauds, and how ads on Updates would be safeguarded, Newton-Rex said all ads in Status would need to comply with Meta's advertising standards. 'We review every advertisement against our policies, and in addition, users can report or block businesses on WhatsApp or report individual ads,' she said.

Meta's Llama 3.1 AI model reproduces copyrighted book text, raising legal alarms

Hans India

2 hours ago

Hans India

Meta's Llama 3.1 AI model reproduces copyrighted book text, raising legal alarms

Meta's latest AI model, Llama 3.1, is under scrutiny after a new study revealed it can replicate large portions of copyrighted books—including Harry Potter and the Sorcerer's Stone—with surprising accuracy. Conducted by researchers from Stanford, Cornell, and West Virginia University, the study found that Llama 3.1 has memorized around 42% of the first Harry Potter book and can reproduce 50-word sections correctly nearly half the time. Among five major AI models analyzed for how they processed the Books3 dataset, Llama 3.1—Meta's 70-billion parameter model released in July 2024—was the most prone to output copyrighted content. By contrast, Llama 1 65B, released in February 2023, had memorized only 4.4% of the same book, highlighting a significant increase in verbatim retention over time. The model was also found to reproduce exact excerpts from other iconic works such as The Hobbit and 1984. Experts suspect this could be due to repeated exposure to the same texts during training, possibly sourced from fan sites, academic analyses, or online reviews. Adjustments to Meta's training strategy may have unintentionally worsened the memorization problem. These findings come amid growing legal pressure on AI developers. The New York Times has already filed a lawsuit against OpenAI and Microsoft, accusing them of copyright infringement by training models like ChatGPT on proprietary articles. The lawsuit claims OpenAI's models can not only reproduce content verbatim but also mimic The Times' unique style. For Meta, these revelations may trigger similar legal risks as calls for transparency and ethical AI development intensify across the industry.