Skip to content
Digimarc adds copyright information to digital data


Software company Digimarc will now allow copyright owners to add more information to their work, which the company says will improve how AI models handle copyright data training.

In a statement, Digimarc said its new The Digimarc Validate service allows users to include property identification in metadata. The company said this means that when copyrighted material is part of a generative AI training dataset, users can point to the digital watermark containing intellectual property information.

For example, an image with Digimarc Validate adds a machine-readable © symbol and includes copyright owner information. The company said Digimarc Validate is powered by its digital watermark detection software, called SAFE, or secure, accurate, fair and efficient, which AI companies must adhere to if they want to prevent copyrighted material from being leaked. author with symbol Digimarc Validate to achieve training datasets.

“Generative AI has changed the rules, and once digital assets are distributed or published, the ability to protect these valuable assets disappears,” said Riley McCormack, president and CEO of Digimarc.

Digimarc said that much of the content of the datasets retrieved for AI training “is copyrighted; it is simply not numerically identified as such. This allows generative AI models to identify which data is protected before the model ingests it for training.

Noting copyright ownership in content metadata sounds good on paper, but it will only work if AI developers actively avoid copyrighted material when training models. So far, AI companies have not promised that they will stay away from copyrighted material in training datasets. However, having a digital trail of copyrights could allow creators to report if AI developers are intentionally infringing protections.

Some AI companies, like Adobe, have said they only use licensed data for training purposes. OpenAI announced that websites can block its crawler from using this data for training.

Meanwhile, Microsoft said it would face legal action if commercial customers using its Copilot products were sued for copyright infringement.

Some of the first lawsuits against developers of generative AI models focus on the thorny issue of copyright infringement. Comedian Sarah Silverman and authors Christopher Golden and Richard Kadrey sued OpenAI and Meta for allegedly using their books to train GPT-4 and Llama 2. Three artists filed a lawsuit against Stable Diffusion, Midjourney and the website ‘art DeviantArt for allegedly violating their copyrights.

To help determine how best to approach AI and copyright issues, the U.S. Copyright Office opened a public comment period on August 30 to understand people’s concerns.

While the White House has secured commitments from AI companies to develop watermarks, the purpose of these watermarks is to identify AI-generated content.

“The risk content owners face if they fail to add an identifier to digital assets before distribution or publication goes beyond simple misuse and theft,” said Ken Sickles, director of Digimarc products. “In the future, your digital assets will make e-commerce transactions more reliable, email more secure, and social media a safer place. »

Digimarc Validate is available for commercial use, starting at $399 per month. Enterprise customers can work with the company for pricing options.