My Writing Is Probably in a Training Dataset and I Have Mixed Feelings About That

strawtin · March 29, 2026, 5:19am

I’ve been writing online for about five years. Blog posts, brand content, some personal essays on platforms that index publicly. Most of that work was almost certainly scraped into at least one training corpus. I never consented to that. I also can’t prove it happened. And I’m genuinely not sure how I feel about it.

Part of me finds it genuinely troubling. I write pieces that are specifically my perspective, my voice, my framing. The idea that those patterns got absorbed into a model that now generates similar patterns on command, without payment or credit - that doesn’t feel neutral.

Another part of me isn’t sure what harm I can actually point to. Nobody’s claiming my words. My clients aren’t losing to AI that outputs my exact sentences. The influence, if any, is distributed and untraceable. A human writer who read my work and absorbed stylistic influence wouldn’t owe me anything. Is AI meaningfully different?

The comparison breaks down somewhere. Scale matters. A human reads my work and is influenced in ways that are idiosyncratic and hard to replicate. A model ingests millions of writers including me and produces output that reflects a statistical average. I’m not credited, not compensated, not present in the output in any traceable way.

The legal question and the ethical question feel separate to me. Legally there are ongoing cases that might land anywhere. Ethically I keep turning this over and not reaching a clean conclusion.

Do other writers here have a settled view? Or is everyone sitting with the same ambivalence?

AlfioRo88 · March 30, 2026, 12:49am

Not ambivalent. The scale argument does it for me.

A human who reads my work and writes something influenced by it is one voice, adding something. A model that ingests thousands of voices produces output that flattens them all and then competes in the same market for the same budgets. The harm is diffuse but the asymmetry is real. I contributed value. I received nothing. That’s not fine just because it’s hard to trace.

penhillcurrant · March 30, 2026, 3:34pm

Major publishers have signed licensing deals with AI companies. Terms aren’t public. The authors whose work those publishers hold rights to weren’t consulted. The publishers got paid. The authors didn’t.

Whether that’s legally defensible is an open question. Ethically it’s fairly clear. The product of the training competes directly with the authors who contributed to it. That seems like harm even if the mechanism is diffuse.

Appartement · March 31, 2026, 9:34am

The technical framing matters more than I see acknowledged.

Training doesn’t store or reproduce your text. It’s gradient descent - adjusting millions of weights across a network. Your specific sentences don’t persist anywhere. What persists is statistical influence on parameter values.

Whether that requires consent or compensation is a legal and ethical question, not a technical one. I’m not saying the answer is no. I’m saying “they’re using my work” is imprecise, and precision might matter for thinking about remedies.

BluestHues1986 · March 31, 2026, 9:25pm

I’ve started including contractual language with clients around AI training data usage. Some find it annoying. I think it’s a reasonable question given that some platforms grant rights to use submitted content for training.

The ambiguity exists partly because there’s no established norm yet. A few years from now this will probably be more routinely addressed in contracts. Right now it’s genuinely unresolved and most creators are just absorbing the uncertainty.

MrDoubtfire · April 1, 2026, 5:10pm

Settled view: the ethical case for compensation is strong and the practical path to it is slow.

Realistic near term - creators who care will make it a contract term, opt-out mechanisms will improve through legal pressure, some licensing deals will eventually standardize for high-volume creators. Smaller creators whose work was scraped but who can’t prove harm will mostly see nothing.

I can only control the terms under which I produce new work, not what’s already out there. Limited form of agency. Currently the only one available.

Topic		Views
AI image generators in professional workflows: the attribution question nobody has solved General AI Discussions	5	April 23, 2026
AI-generated music in video journalism: does disclosure change anything? AI in Journalism & Media	5	April 23, 2026
Why does AI writing sound like AI? Trying to understand the mechanics of the problem AI Humanizer Tools	5	March 24, 2026
Using AI to write in your own voice: is that even possible or are we fooling ourselves? AI Writer Tools	5	April 22, 2026
"AI-Assisted" Is Doing a Lot of Work Right Now and We Should Talk About What It Actually Means Education & Academic Writing	5	March 26, 2026

My Writing Is Probably in a Training Dataset and I Have Mixed Feelings About That

Related topics