GitHub Will Train AI on Your Copilot Data Starting April 24. The Fine Print Is Worse Than the Headline.

GitHub Will Train AI on Your Copilot Data Starting April 24. The Fine Print Is Worse Than the Headline.
GitHub Will Train AI on Your Copilot Data Starting April 24. The Fine Print Is Worse Than the Headline.

Developer Privacy — March 26, 2026

GitHub Will Train AI on Your Code
Starting April 24. The Fine Print Is Worse.

Starting April 24, GitHub will use Copilot interaction data from Free, Pro, and Pro+ users to train AI models by default. The private repo loophole, the enterprise pay-for-privacy moat, and the full data taxonomy reveal more than the opt-out instructions.

Apr 24
Effective Date
Default opt-in training starts. Free, Pro, Pro+ all affected. Enterprise excluded at a price.
Default
Opt-In Policy
You are opted in unless you actively disable it. Settings buried under privacy preferences.
Pay
Privacy Moat
Enterprise plan excludes training. Privacy is now a premium feature, not a default right.
MSFT
Trains on Output
Not just your code — your acceptance and rejection patterns. Behavioral training data at scale.

Sources: GitHub Copilot data policy update; Microsoft privacy terms; GitHub blog announcement; April 2026.

GitHub announced on March 25, 2026 that starting April 24, all interaction data from GitHub Copilot Free, Pro, and Pro+ users will be used to train AI models by default. The change covers inputs, outputs, accepted code snippets, surrounding code context, comments, file names, repository structure, navigation patterns, and feedback signals. Users must manually opt out by visiting github.com/settings/copilot and disabling the training toggle. Copilot Business and Enterprise users are contractually exempt. The 30-day notice window expires on April 24.

The headline is about training data. The real story is about what “private repository” now means on GitHub, who pays for actual privacy, and what behavioral data at scale tells Microsoft about how software gets written.

Exactly What GitHub Will Collect

GitHub Chief Product Officer Mario Rodriguez published the full data taxonomy in the policy update. The interaction data includes code snippets shown to Copilot for context (the code surrounding your cursor position when Copilot activates), the suggestions Copilot generates, which suggestions you accept or modify, your prompts in Copilot Chat, and your thumbs-up or thumbs-down feedback ratings. File names, repository structure metadata, and navigation patterns across your codebase are also captured.

This is not just your code. It is your coding behavior. Acceptance patterns reveal which kinds of suggestions developers trust. Rejection patterns reveal where AI-generated code fails to match human intent. Navigation patterns show how developers move through codebases when solving specific types of problems. That behavioral dataset is arguably more valuable to Microsoft than the raw code, because it trains models to predict not just what code looks like but how developers actually work.

The Private Repository Loophole

GitHub’s FAQ draws a careful distinction: “We do not use private repository content at rest to train AI models.” The phrase “at rest” is doing significant work in that sentence. Private repository code stored on GitHub’s servers is not scanned for training. But when you open a private repository and use Copilot, the code that Copilot sees during your active session, the context window around your cursor, the completions it generates from your proprietary code, all of that is interaction data. And interaction data is what the new policy covers.

Put differently: your private repo is private until you use Copilot in it. The moment Copilot activates, the code it processes becomes part of the interaction dataset. GitHub is not lying when it says private repos at rest are excluded. It is being precise in a way that most developers will not parse on first reading. The distinction between “code at rest” and “code during an active Copilot session” is the core loophole. Developers who assumed private repositories were fully excluded from training data need to re-read the policy carefully.

The Enterprise Pay-for-Privacy Model

Copilot Business ($19/user/month) and Copilot Enterprise ($39/user/month) are contractually exempt from the training data program. Their interaction data is never used to train models. This creates a two-tier system where privacy is a premium feature rather than a default right. Individual developers on the Free, Pro ($10/month), and Pro+ ($39/month) tiers are the product. Enterprise developers are the customer.

The business logic is straightforward. Enterprise contracts generate predictable revenue and include negotiated terms. Individual developer accounts generate less revenue per user, so GitHub monetizes them twice: once through subscription fees and once through training data. Rodriguez justified the change by citing “established industry practices,” pointing to Anthropic, JetBrains, and Microsoft as companies with similar opt-out policies. That framing normalizes the practice without addressing whether it should be normalized.

What the Community Reaction Reveals

The GitHub Community discussion thread for the policy change showed 59 thumbs-down reactions and 3 positive reactions as of March 27. Developer frustration centers on three points. First, the opt-out default. In the European Economic Area, GDPR typically requires opt-in consent for data processing beyond what is strictly necessary for service delivery. GitHub is relying on “legitimate interest” as its legal basis for EEA users, which is the weaker of the two available justifications and the one most likely to be challenged by data protection authorities.

Second, the scope creep. GitHub’s original Copilot launch explicitly positioned the product as not using individual interaction data for training. The April 24 change reverses that commitment. Users who signed up for Copilot under the old terms now face a retroactive change to data handling. GitHub’s response: users who previously opted out retain their preference. But users who never made an active choice are now opted in.

Third, the affiliate sharing clause. The updated privacy statement confirms that interaction data “may be shared with GitHub affiliates, which are companies in our corporate family including Microsoft.” Microsoft operates Azure OpenAI Service, Bing Chat, and multiple AI products that compete with standalone developer tools. The statement says data will not be shared with “third-party AI model providers,” but Microsoft itself is the largest AI model provider in GitHub’s corporate family.

How to Opt Out (and What Opting Out Does Not Cover)

Navigate to github.com/settings/copilot/features. Under the Privacy heading, disable “Allow GitHub to use my data for AI model training.” The toggle takes effect immediately. Interaction data collected before you opt out may already be in the training pipeline. GitHub does not specify a deletion timeline for previously collected interaction data, only that collection stops going forward.

Opting out of training does not opt you out of all data collection. GitHub still collects interaction data for “service operation and improvement,” which is a separate category from model training. The distinction between data used to run the service and data used to train models is real but narrow. Service operation data includes the same inputs and outputs. The difference is what happens downstream.

What This Means for Competitors

Cursor markets a “Privacy Mode” that offers zero data retention when enabled. Cursor‘s model does not train on user data in privacy mode, full stop. That positioning becomes significantly more valuable after April 24. JetBrains’ AI assistant similarly allows users to run local models with no data leaving the machine. The GitHub policy change hands competitors a differentiation lever they did not have to build. They just have to point at it.

For OpenAI, the dynamics are more complex. OpenAI provides the models that power Copilot, but GitHub’s updated terms specifically state that interaction data will not be shared with third-party AI model providers. If that firewall holds, OpenAI does not benefit directly from Copilot training data. Microsoft’s internal models do. The question of whether those Microsoft models eventually compete with OpenAI’s own products is a strategic tension that neither company has resolved publicly.

The Bottom Line

GitHub is not doing anything unusual by AI industry standards. Every major AI platform is converging on the same playbook: use interaction data to train better models, offer opt-out rather than opt-in, exempt enterprise customers who pay more. The April 24 change is notable because of GitHub’s position as the default code hosting platform for most of the world’s developers. When GitHub changes its data policy, the blast radius is measured in tens of millions of developer accounts, many of whom will never read the announcement, never find the settings page, and never know their coding patterns are training the next version of the tool they use every day.

Sources: GitHub official blog announcement, March 25, 2026; GitHub Community FAQ Discussion #188488; The Register analysis; GitHub Privacy Statement update; GDPR Article 6(1)(f) legitimate interest provisions.

Discover more from My Written Word

Subscribe now to keep reading and get access to the full archive.

Continue reading