Why C2PA is necessary, what it can and cannot do.

C2PA is "An open technical standard providing publishers, creators, and consumers the ability to trace the origin of different types of media." C2PA stand for Coalition for Content Provenance and Authenticity.  In plain language, it adds a manifest to creative work such that you can trace the what created it and what has happened to it since it was created.

C2PA has become important because where files come from (the provenance in C2PA) of all types of files has become critical.  Our need to know more about where our files come from is driven by the ability of AI to manipulate images. It is valuable to know if, for example, I wrote this, if it was generated by AI or if it was written by me with AI assistance.

What will C2PA be used for?

Tracking down abusers

Without C2PA, AI will make it possible to create similar but not identical works of art.  For example, someone may alter a piece of photography to make it appear as though someone was in that picture where that person was never there.  In doing so, the person altering the picture will either indicate that the photo was edited, or they will not update the C2PA data and there will be a gap in the C2PA data.  This allows for forensic examination of the photo in a way that is difficult.

Protecting journalism and the rights of the public

Through the mechanisms of C2PA, it will be possible to identify the location a picture was taken at and how it has been modified.  C2PA can provide forensic information, helping to prevent things like deepfakes.  As technology gets better at creating fake images and videos, the ability to trace the provenance of said images becomes critical to being able to trust that what you are viewing is from a real source.

What can C2PA be applied to?

C2PA can be applied to the most common video and image formats.  Text, audio, word, pdf and other documents are also supported.  In the future, just about any creative work that benefits from having a known provenance should be supported by C2PA.

How does C2PA work?

C2PA adds a new type of metadata in files to add the C2PA manifest.  For thing like images, many people are already familiar with some of the metadata that is frequently applied to images.  Information like the location that a picture was taken is metadata that has frequently been applied to images.  The C2PA manifest adds information about the creation and modification of an image.  The C2PA also cryptographically signs that information to ensure that it is easy to detect tampering.  

Here is an example of what might be recorded in the C2PA manifest:

    1) The original specific camera that took the photo
    2) The software that later cropped the photo and added a title to the image.
    3) The AI who edited the image to change the color of the sky.

All of the C2PA information is cryptographically signed such that it can be traced back to how, what, where and when it was created or modified.  In this way, image, a video or a piece of writing has a history of where it came from (aka its provenance).

User information with CAWG

C2PA has a related extension called CAWG (Creator Assertions Working Group), which identifies a creative work as being made by a particular person and to identify who has modified it afterwards.  CAWG helps enhance what C2PA is already providing, extending it use to address what is occurring with individual users within a C2PA manifest.

What will CAWG be used for?

Identifying good actors

CAWG allows a person examining the manifest to determine who to speak to about the use of the creative work.  So, for example, if a company wants to use a particular work of art to train their AI models, they can identify who to contact by looking at the manifest.  While currently watermarks may be added to allow someone to track down the owner of a piece of creative work, they are not necessarily a description of who owns the work at this time.  

What else can we do with all this?

Communicating acceptable use/copyright

Additional standards will be able to work with CAWG and provide indications of things like ownership, but in a more verifiable manner than is possible.  Using IPTC (International Press Telecommunications Council) standards, for example, you can provide copyright metadata.  C2PA can update this information and it can track that information for a piece of work even if that work is digitally duplicated.

What are the problems with C2PA?

Loss of the manifest

One issue that exists with C2PA is that files can be modified.  Because they can be modified, the manifest that was added to track the file can be lost.  Soft binding, which we will get to later, can help with this.  But in the end there isn't any one solution that will make it so that the manifest will be maintained in all cases.  A popular example is that you can take a picture of a picture or a recording of a movie and whatever manifest existed before is completely lost.

Privacy concerns

A big issues with metadata, particularly with respect to images, is privacy.  Many online social networks remove at least some metadata to avoid leaking information like the location a picture was taken.  There are cases where, due to privacy concerns, the C2PA manifest will be removed from the file's metadata.  This is a very serious concern and on that can only really be dealt with through education.  Some provenance information may, for example, put someone in danger.  Education and building the tools to allow people to understand the choices they are making as they use C2PA will let people make decisions on what they do and don't want to share.

Redaction

Redaction is a built in approach to removing data from C2PA.  An example where it might be used would be to protect a photographer in a war who need to keep their identity anonymous.  Redaction is designed to let people deliberately remove some or all of the C2PA manifest from a creative work.  This gets very complex as some information shouldn't be removed.  Sometimes, even the details of what is being removed shouldn't be recorded.  This is a similar issue that is addressed by Right to be Forgotten laws.  With possibly multiple copies of a creative work existing, it may be very hard to completely redact information attached to C2PA.

Is it true?

A significant issue with C2PA is "how true is this".  The C2PA manifest may, for example, claim that the image wasn't created by AI or that it was shot by a famous photographer when it was shot by me.  C2PA doesn't validate that the information it is providing is true.  Just as we can't trust other people to be truthful (see the entire internet), C2PA is only a signal that we can use to make that judgment for ourselves.

What are solutions to those problems?

Watermarking

In this case, this is usually a imperceptible watermark.  Watermarks historically have been used for identifying ownership.

Hash data

A data hash creates a near-unique alphanumeric string for a file. This provides a second key to be used the uniquely identify a file.

Incompleteness & duplication

Incompleteness: if there is an edit made without C2PA that can be marked as a gap in the C2PA provenance.

Duplication: It is entirely acceptable to have multiple versions a work.  For example, maybe I make one illustration with blue seas and another near identical illustration with green seas.  In this case, they should have different C2PA manifests.

Soft Binding

Combining a watermark and a soft hash provides enough key information to re-attach the C2PA manifest to the creative work.  In this way the provenance can be returned to the work, even if your work is posted on a site that removes the manifest.

Blockchain 

A good question is, "wouldn't blockchain provide a transparent and clear ability to trace this stuff"?

The generally accepted answer is that blockchain is attractive, but that the sheer number of creative works would be an issue for a blockchain to maintain.  There are over 100 billion images on Google's image search.  Blockchain can be used with C2PA, but it isn't a requirement of the specification.

Why is C2PA still valuable, regardless of the issues with it?

Functionally, at this point, there is no way to track where an image came from, how someone might use it or what they are allowed to use it for.  There are services that will look for copyright infringement, for example, but it is unclear how a person would legally get access to the original owner of a picture.  Additionally, without C2PA there is no way to tell AI specifically what to do with the photo (maybe you are ok with the AI using it for training if you get paid for the data).

What does all this have to do with Long Tailed Leopard?

Long Tailed Leopard is implementing a tool (in Alpha now: https://ltlprotect.com/metadata_modifier/) to allow a user to add metadata to their images and videos.  After the initial alpha period, we will be exposing an API for large scale use.  Soon, Long Tailed Leopard will empower you to inform AI that your work should not be used for training their systems.

Comments

Post a Comment

Popular posts from this blog

Open Source AI Funding Will Dry Up