Artificial Intelligence (AI) Training: Ask for Forgiveness or Ask for Permission?

During the last few months we have seen an exponential growth of technologies that use generative Artificial Intelligence (AI) through the use of Large Language Models (LLM’s).

In very simple words, what these technologies do is to create a model of “transformers“, which correspond to neural networks that learn contexts and therefore meanings by tracking relationships in sequential data. Thus, when certain parameters or queries(input) are entered, the system or language model produces a certain result(output), which is mainly based on predictions regarding the most accurate or correct answer.

The problem that arises from the use and massification of these technologies is that the power source of these systems is the “…”.data“The system will make the prediction and generate the answer to the question asked on the basis of these data. In this way, the energy or “gasoline” used by these systems will be the data loaded into them, since the system uses them to generate the responses.

And here lies one of the major problems arising from the use of these technologies, since in the vast majority of cases there is no principle of transparency . The data that are loaded or included in these predictive models are essential not only to understand the results they produce, but also to determine whether the loading and training of these systems are affecting the rights of third parties that are enshrined and protected by the laws of the different countries.

Indeed, there are a number of legal implications in both the use and prior training of these tools, which could affect various third party rights. Just as an example, we can point out the impact that could be produced in the areas of privacy and processing of personal dataThe “packages” of information could include a series of confidential data and even possibly sensitive personal data, such as age, race, contact information, sexual preferences, medical records, among many others that may be of concern.

Another emerging risk is the problem of bias. or lack of objectivity that may result from the use of these generative AI tools, since depending on the data and information entered in these tools, answers loaded with prejudices could be generated which, beyond the underlying philosophical questions (such as the fact that these prejudices could be a reflection of our reality as human beings), should be kept out of these systems, so that the answers provided are as objective as possible.

This is especially delicate if one thinks, for example, that in the future these systems that use artificial intelligence will be used to decide a conflict or litigation (a kind of virtual judge) because if the decision-making capacity will be affected by the data loaded into the tool, it is quite likely that they will lack the required objectivity (since they could judge a person by their origin or race, by their skin color, by their sex or other equivalent elements).

In addition to the above, there is concern about cybersecurity issues, as these tools are being misused to generate cyber-attacks on companies, critical infrastructure (such as hospitals, means of transportation or others), or on individuals to generate scams, hacks or malware generation.

We return to the point that both the use and the data used in the training of these tools will be crucial to get an idea of the results that these tools generate, or the uses for which they are intended to be used. It would be interesting to know what effective control measures are being adopted within companies, governments or even individuals themselves, in order to avoid the serious impact they could suffer in the event of attacks of this type.

In addition to the above, another growing concern is the use of material or works that are protected by intellectual and industrial property regulations. to perform the training of these generative artificial intelligence systems, an issue that is increasingly giving rise to legal disputes with owners alleging the unauthorized use of these works to train these systems (and we have even seen strikes and lawsuits by artists, writers or unions of actors, writers and screenwriters against the main providers of these services such as ChatGPT).

And since at least up to now there is no principle of transparency with respect to the data used in the training of these systems, we fall into these black boxes in which it is not known what data were usedmany of which could indeed be protected by our current legislation.

Notwithstanding the above, it is evident that these technologies are “stressing” the current intellectual property system, which has been confronted with a series of questions. Just as an example, it is worth asking ourselves if it is really reasonable to disregard the work of a human being, who after multiple attempts or modifications of an instruction or prompt achieves a certain result using these generative artificial intelligence tools. Here we have the recent case of a U.S. Court that refused to grant protection to an AI-generated work, even though its owner claimed to have performed more than 640 prompts and post-editing work with Photoshop software before achieving the final image). Is it feasible to qualify the work resulting from this work and multiple modifications as “unexpected” and therefore exempt from the legal protection conferred for this type of work?

On the other hand, is it reasonable, given the current speed of technological progress, to maintain the current terms of protection of works, which as a general rule confer protection for a term of 70 years after the death of the author? Are the current regulations sufficient to face the abysmal reality of the facts, and the avalanche of works that are created every day using generative AI? It is worth remembering that according to recent estimates, the same amount of images were generated with IA in less than a calendar year as in the entire history of photography… Are new protection mechanisms required for works created with generative AI?

The only certainty we currently have is that to date there is no regulation, norms or legal framework for the use of these new technological tools, which in some way provides certainty to both users and providers of these systems. Although there are some initiatives underway (such as that of the European Union) to reach a consensus on these and other matters, we will have to wait a while to see the outcome of such a regulatory attempt, its implications, scope and effectiveness.

The idea is to achieve a necessary equilibrium or balanceThe regulation should not paralyze or delay the wave of innovative technological advances (for example, by establishing administrative oversight bodies that result in bureaucracy, or by creating obstacles or technical or economic requirements so high that only a few agents can comply with them), but on the other hand safeguard basic principles or rights that could be affected by the training and use of these tools.

All this is still too recent and we will have to see how it evolves over time.

At least for now, and answering the question we initially asked ourselves, the attitude of most of the providers of these systems has been (and probably will continue to be while waiting for a regulation that regulates or prevents it) to ask for forgiveness and not to ask for permission?