GitHub Will Train AI on Your Copilot Data Starting April 24. The Fine Print Is Worse Than the Headline.

GitHub announced on March 25, 2026, that starting April 24, it will use interaction data from Copilot Free, Pro, and Pro+ users to train its AI models. The setting is enabled by default. If you do not find it and disable it, your code snippets, prompts, accepted suggestions, file names, repository structure, navigation patterns, and feedback all become training data for Microsoft-owned models.

That is the headline version. Every outlet covering this story leads with the opt-out instructions and moves on. But the actual policy changes, buried across three separate documents (a blog post, a changelog entry, and a revised Privacy Statement and Terms of Service), reveal a more complicated picture than what GitHub wants developers to see.

The Private Repository Loophole

GitHub states clearly that it does not use private repository content “at rest” to train AI models. That sounds reassuring until you read what “at rest” actually means.

CPO Mario Rodriguez addressed this directly in the announcement: the phrase “at rest” is used deliberately because Copilot does process code from private repositories when you are actively using it. This interaction data is required to run the service and could be used for model training unless you opt out.

Your private repository code stored on GitHub servers will not be scanned and fed into training. That is the “at rest” protection. But the moment you open VS Code and Copilot generates a suggestion based on your private code, that interaction becomes training data. The code context around your cursor, the file names, the repository structure, the suggestion Copilot generated, and whether you accepted or rejected it: all of that is now fair game.

For a solo developer working on a side project, the stakes may be low. For a freelancer writing proprietary code on a personal Copilot Pro account using private repos, the distinction between “at rest” and “active session” data is the difference between protected and exposed. The policy does not distinguish between hobby code and commercial work. It distinguishes between plans.

The Full Data Taxonomy

The top five search results for this story all follow the same template: what changed, who is affected, how to opt out. None of them explain the full taxonomy of data GitHub now collects or why the specific categories matter.

According to the changelog and FAQ, the collected data types include: code snippets and outputs accepted or modified by users, inputs sent to Copilot including prompts and code context, code context surrounding the cursor position, comments and documentation written by users, file names and repository structure, navigation patterns across files and projects, interactions with Copilot features like chat and inline suggestions and code review, and feedback such as thumbs up and thumbs down ratings on suggestions.

Navigation patterns are the most revealing entry on that list. They show how a developer moves through a codebase, which files they open in what order, how they structure projects, and where they spend time. That is behavioral data about development workflows, not just code.

Accepted and rejected suggestions are equally telling. They expose coding preferences, decision-making habits, and the gap between what a model suggests and what a developer actually writes. Over millions of users, that data set becomes a high-resolution map of how professional software gets built.

Pay for Privacy: the Enterprise Exemption

Copilot Business and Copilot Enterprise users are completely excluded from data collection for training. Their contracts prohibit it. GitHub honors those commitments. Students and teachers are also exempt.

That creates a two-tier system. Individual developers on Free, Pro, and Pro+ plans are enrolled by default into a program where their work product becomes training material. Enterprise customers who pay more get contractual guarantees that their data stays private.

GitHub frames this as honoring enterprise agreements. Read it as a product strategy and the logic is different: privacy becomes a premium feature. Pay more, get data protection. Pay less (or nothing), and your interactions subsidize model improvements that benefit everyone, including the enterprise customers whose data stays clean.

This is not unique to GitHub. The same pattern exists across the AI tool ecosystem. Anthropic, JetBrains, and Microsoft all operate similar opt-out data use policies for individual users. GitHub cited all three in its FAQ as precedent. The industry has quietly converged on a standard where individual users provide training data by default and enterprise contracts provide the exit.

The Microsoft Data Pipeline

The updated Privacy Statement expands data sharing with GitHub affiliates, which explicitly includes Microsoft. The new affiliates section states that shared data may now be used for developing and improving artificial intelligence and machine learning technologies.

This means Copilot interaction data does not just train GitHub models. It flows to Microsoft for use across their AI products, subject to their own privacy practices. GitHub states this data will not be shared with third-party AI model providers or other independent service providers. But Microsoft is not a third party. Microsoft owns GitHub. The data stays inside the corporate family, which is a very large family that ships Copilot in Windows, Office, Azure, and Bing.

The opt-out preferences travel with the data, according to GitHub. If you opt out on GitHub, Microsoft affiliates must honor that preference. Whether that mechanism works perfectly across every Microsoft product using this data pipeline is a question that GitHub’s announcement does not answer.

The Consent Erosion Timeline

The history of Copilot data policy tells its own story. The original Copilot launched in 2021 trained on publicly available code hosted on GitHub. That decision triggered a class-action lawsuit and widespread developer backlash over whether open-source licenses permitted that use.

GitHub responded by eventually excluding user interaction data from training entirely. All plan tiers were protected. Now the policy reverses. Free, Pro, and Pro+ users are once again subject to training data collection. The cycle moves in one direction: from training on public code, to backing off, to training on interaction data with an opt-out.

The pattern tracks the intensifying competition among AI companies. As model performance becomes increasingly dependent on real-world usage data, the economic incentive to collect it overrides prior privacy commitments. GitHub is following the same arc that every AI platform follows: launch with privacy assurances, grow the user base, then monetize the data.

What the Safeguards Do Not Cover

GitHub describes several technical safeguards: automatic filtering to detect and remove API keys, passwords, tokens, and personal information, controls over which repositories Copilot can access, access limited to authorized personnel, and access logs with auditing.

These protections address the most obvious failure modes: credential leakage and unauthorized access. What they do not address is subtler. The announcement does not specify a minimum interaction threshold before data enters the training pipeline. It does not explain how code is anonymized or whether de-identification techniques can survive the kind of membership inference attacks that have been demonstrated against language models. It does not state whether interactions that occurred before the April 24 effective date will be included.

For organizations running sensitive projects on individual plans, the safeguards may be insufficient. The only guaranteed protection is the enterprise tier.

What Developers Should Do Before April 24

The opt-out process is straightforward. Navigate to github.com/settings/copilot. Open the Privacy section. Set “Allow GitHub to use my data for AI model training” to Disabled. Save. If you have multiple GitHub accounts, repeat for each one.

The opt-out only stops future collection. It does not retroactively remove data already collected. If you have been using Copilot on a Free or Pro plan and have not opted out, your interactions between now and whenever you flip the switch are potentially in the training pipeline.

For teams, the calculus is different. If you are running any proprietary work on individual Copilot plans, the April 24 deadline creates an immediate decision point. Either upgrade to Business or Enterprise for contractual data protection, or ensure every team member opts out individually and accept that the protection is preference-based, not contractual.

The connection to the broader AI supply chain security picture is direct. The same week this policy dropped, the LiteLLM supply chain attack demonstrated that AI development tools are high-value targets for credential theft. When your AI coding assistant is simultaneously collecting your code patterns, navigation habits, and project structure, the attack surface for any compromise of that pipeline grows proportionally.

The economics of AI model training create the same pressure across every company in the space. OpenAI just shut down Sora because inference costs were unsustainable. Training costs are equally relentless. Free user data is the cheapest fuel available, and every AI company knows it.

This Is the New Normal

GitHub is not the outlier. It is the latest and largest platform to formalize what has become standard practice across the AI tools industry. Your interactions with AI products are training data unless you specifically prevent it. The burden of protection falls on individual users, not on the platforms collecting the data.

The April 24 deadline is less than 30 days away. Every developer using Copilot on a personal plan should check their settings before then. The opt-out takes 30 seconds. The default takes everything.

Sources: GitHub Blog (Mario Rodriguez), GitHub Changelog, GitHub Community FAQ, The Register, WinBuzzer, gHacks, Help Net Security

My Written Word

Qwen 3.5 9B Matches Models 13x Its Size: What Small Models Mean for Edge AI

Apple’s AI Reckoning: Why Siri Runs on Google’s Gemini Now

The AI Supply Chain Is the New Attack Surface: From Ultralytics to LiteLLM

Feature is an online magazine made by culture lovers. We offer weekly reflections, reviews, and news on art, literature, and music.

Thank you for your response. ✨

My Written Word

My Written Word

GitHub Will Train AI on Your Copilot Data Starting April 24. The Fine Print Is Worse Than the Headline.

The Private Repository Loophole

The Full Data Taxonomy

Pay for Privacy: the Enterprise Exemption

The Microsoft Data Pipeline

The Consent Erosion Timeline

What the Safeguards Do Not Cover

What Developers Should Do Before April 24

This Is the New Normal

Share this:

Qwen 3.5 9B Matches Models 13x Its Size: What Small Models Mean for Edge AI

Apple’s AI Reckoning: Why Siri Runs on Google’s Gemini Now

The AI Supply Chain Is the New Attack Surface: From Ultralytics to LiteLLM

Feature is an online magazine made by culture lovers. We offer weekly reflections, reviews, and news on art, literature, and music.

Thank you for your response. ✨

My Written Word