Introduction to Generative AI

Explore Generative AI (GenAI), how it works and its strengths and weaknesses. Find guidance to help ensure that use of GenAI across 乱伦秀 is effective, ethical and transparent.

On this page:听

What is GenAI?听An introduction to its different forms and some of the available tools.
How does GenAI work?鈥�Learn about the hierarchy of the technologies that听text and image GenAI use, and some of the issues around training a text GPT.
What are the strengths and weaknesses of GenAI?听Some of the benefits of AI and why we should听critically evaluate its outputs.听
Further information听

What is GenAI (such as ChatGPT)?

GenAI听is an Artificial Intelligence (AI) technology that automatically generates content in response to written prompts. The generated content includes texts, software code, images, videos, and music.鈥€�

GenAI is trained using data听from webpages, social media conversations and other online content. It generates its outputs by statistically analysing the distribution of words or pixels or other elements in the data that it has ingested and identifying and repeating common patterns (for example, which words typically follow which words).鈥€�

There are many other types of AI applications, that do not involve GenAI, that are having an impact on teaching and learning. These other types of AI (known as 鈥榯eaching and learning with AI鈥� or 鈥楢IED鈥�) will be addressed in more detail as these web pages develop.鈥�

搁别尘别尘产别谤:鈥�听

GenAI looks accurate鈥� but it isn鈥檛
GenAI looks intelligent鈥� but it isn鈥檛
GenAI looks as if it understands鈥� but it doesn鈥檛.鈥�

Microsoft CoPilot, also known as Bing Chat Enterprise

乱伦秀 staff and students can听access Microsoft Copilot, which can be used for both text and image generation. With commercial data protection, this is intended as a more secure alternative than听other GenAI services. If you wish to use GenAI, then this is the safest听way听to do so.听

If you're logged into Microsoft Copilot听with your 乱伦秀 credentials, what goes in听鈥� and what comes out 鈥� is not saved or shared, and your data is not used to train the models.听

Find out more and how to access Microsoft CoPilot听on .

Practical guidance on how educators can use Microsoft CoPilot is available on听.听

Text GenAI鈥�

In response to a human-written prompt, text GenAI generates text that usually appears as if听a human has written it.

Yet, just like human-written texts, text GenAI outputs can be superficial, inaccurate, untrustworthy, and full of errors.鈥�

"Large language models [which is the technology behind text GenAI] are the ultimate bullshitters because they are designed to be plausible (and therefore convincing) with no regard for the truth.鈥濃€鼳ssociate Professor Carissa V茅liz, University of Oxford

Despite appearances, text GenAI does not understand either the prompt written by the human or the text that it generates.鈥�

Every time that we use a text GenAI tool, we need to consider its output from a听sceptical perspective.鈥�

Examples of text GenAI tools:

听(Google)听
听(OpenAI)鈥�
听(Anthropic)鈥�
听(HuggingFace)鈥�
听(Meta)鈥�

Please note that 乱伦秀 does not recommend any of the tools in this list. Microsoft CoPilot is now available for 乱伦秀 students and staff. Read more about using CoPilot.听

Examples of other GenAI tools built on top of GenAI tools:

听(summarises and answers questions about submitted PDF documents)鈥�
听(aims to automate parts of researchers鈥� workflows, identifying鈥齬elevant papers and summarising key information)鈥�
听(听Google Chrome extension that gives ChatGPT Internet access, to enable more accurate and up-to-date conversations)
Microsoft has incorporated ChatGPT into its Bing search engine听and is implementing ChatGPT across its Office portfolio.

Please note that 乱伦秀 does not recommend any of the tools in this list. Microsoft CoPilot is now available for 乱伦秀 students and staff. Read more about using CoPilot.听

Image/video/music GenAI鈥�

Image, video and music GenAI can generate outputs based on human-written prompts. Some can also respond to visual or musical prompts.鈥�

Again, the appearance of image/video and music GenAI outputs听might appear novel. However听usually they are only complex combinations of the millions of images/videos/music that they have ingested during their training.鈥�

On the one hand, this is how creativity often works. For example, Rock and Roll music combined ideas from R&B, gospel and country music.鈥�

But importantly,听Rock and Roll drew on ideas from the earlier works. Meanwhile, GenAI actually uses the earlier works in its outputs, and without the consent of the original creators. 听

Another issue raised by image GenAI is how difficult it can be to write an effective prompt. For example, the breakthrough听AI image Th茅芒tre D鈥檕p茅ra Spatial, took weeks of prompt writing and fine-tuning hundreds of images.鈥€�

Examples of image/video and music GenAI tools:

听(OpenAI鈥檚 image GenAI tool)
听(Stable Diffusion鈥檚 image GenAI tool)
听(Image GenAI tool)鈥�
听(Video GenAI tool)
听(Music GenAI tool)
听(Music GenAI tool).听

Please note that 乱伦秀 does not recommend any of the tools in this list. Microsoft CoPilot is now available for 乱伦秀 students and staff. Read more about using CoPilot.听

Return to top听

How does Generative AI work?鈥�

Both text and image GenAI are based on a set of AI techniques that have been available to researchers for several years and have been built one on top of another.听

In this section:听Text GenAI听 |听听Issues around training a text GPT听 |听听How a GPT generates text鈥� |听听Image and Music GenAI

Text GenAI鈥�

Although the terms listed below are all often used in descriptions of GenAI, it isn鈥檛 necessary to understand exactly what they all mean. The most important things to note are the hierarchy of the technologies and their complexity.鈥�

ChatGPT (and other text GenAI)鈥齣s a type of:听听听

Generative Pre-trained Transformer听(GPT: an advanced type of LLM)鈥�

which is a type of听Large Language Model听(LLM: a massive computer-based representation of examples of natural language)鈥�

which is a type of听General-purpose Transformer听(an ANN language processor)鈥�

which is a type of听Artificial Neural Network听(ANN: an ML approach inspired by how the human brain works, its synaptic connections between neurons)鈥�

which is a type of听Machine Learning听(ML: an approach to AI that uses algorithms to automatically improve its performance from data)鈥�

which is a type of听Artificial Intelligence.鈥�

Issues around training a text GPT鈥�

So that a text GenAI can generate text, it first has to be trained. This involves the tool being provided with and processing huge amounts of data scraped from the internet and elsewhere. It is reported, but not confirmed by OpenAI, that the training of GPT4 involved a million gigabytes of data. Processing this data involves identifying patterns, such as which words typically go together (e.g.听鈥淗appy鈥� is often followed by 鈥淏irthday鈥�).鈥�

Carbon footprint听

Training a GPT requires huge amounts of power and indirectly generates huge amounts of carbon, with important consequences for climate change. For example, it is estimated that the training of GPT3 (the GPT used by the first version of ChatGPT made available to the public) consumed 1,287 megawatt hours of electricity and generated 552 tons of carbon dioxide, the equivalent of 123 cars driven for one year.鈥�

Feedback loop听

Another concern is that when future GPTs are trained, the data that they ingest are likely to include substantial amounts of text generated by previous versions of GPT. This self-referential loop might contaminate the training data and compromise the capabilities of future GPT models.鈥�

Human costs听

Once the text GenAI model is trained but before it is used, it is often checked and refined in a process known as Reinforcement Learning from Human Feedback (RLHF). In RLHF, text GenAI responses are reviewed and validated by human reviewers. These human reviewers ensure that the GenAI responses are appropriate, accurate, and align with the intended purpose. Sometimes the provider of the GenAI then sets up what are known as 鈥榞uardrails鈥� to prevent the GenAI generating objectionable materials.鈥�

In the development of ChatGPT, the RLHF reviewers mostly were workers in global south countries such as Kenya. Workers were paid less than $3 per hour to review the outputs of ChatGPT and identify any objectionable or nasty materials. This work has had a massive negative impact on many of those who were involved.

How a GPT generates text鈥�

Once the GPT has been trained, generating a text response to a prompt involves the following steps:鈥�

1.听听 The prompt is broken down into smaller units (called tokens) that are input into the GPT.鈥�

2.听听听The GPT uses statistical patterns to predict likely words or phrases that might form a coherent response to the prompt.鈥�

The GPT identifies patterns of words and phrases that commonly co-occur in its prebuilt large data model (which comprises text scraped from the Internet and elsewhere).鈥�
Using these patterns, the GPT estimates the probability of specific words or phrases appearing in a given context.鈥�
Beginning with a random prediction, the GPT uses these estimated probabilities to predict the next likely word or phrase in its response.鈥�

3.听听听The predicted words or phrases are filtered through what are known as 鈥榞uardrails鈥� to remove any offensive content.鈥�

4.听听听Steps 2 to 3 are repeated until a response is finished. The response is considered finished when it reaches a maximum token limit or meets predefined stopping criteria.鈥�

5.听听听The response is post-processed to improve readability by applying formatting, punctuation, and other enhancements (such as beginning the response with words that a human might use, such as 鈥淪ure,鈥� or 鈥淐ertainly,鈥� or 鈥淚鈥檓 sorry鈥�).鈥�

Image and Music GenAI鈥�

Image GenAI and music GenAI use a different type of ANN known as Generative Adversarial Networks (GANs) which can also be combined with Variational Autoencoders. Here, we focus on image GANs.鈥€�

GANs have two parts (two 鈥榓dversaries鈥�), the 鈥榞enerator鈥� and the 鈥榙iscriminator鈥�. The generator creates a random image in response to the human-written prompt, and the discriminator tries to distinguish between this generated image and real images. The generator then uses the result of the discriminator to adjust its parameters, in order to create another image.鈥€�

This process is repeated, possibly thousands of times, with the generator making more and more realistic images that the discriminator is increasingly less able to distinguish from real images.

For example, a successful GAN trained on a dataset of thousands of landscape photographs might generate new but unreal images of landscapes that are almost indistinguishable from real photographs.鈥€�

Meanwhile, a GAN trained on a dataset of popular music (or even music by a single artist) might generate new pieces of music that are very similar to but still different from the structure and complexity of the original music.鈥�

Return to top听

What are the strengths and weaknesses of GenAI?听

Some of the benefits of AI and why we should听critically evaluate its outputs.听

Strengths 听

GenAI can听produce diverse and seemingly original outputs, creating content that may not have been seen before based on patterns in the data they were trained on.

GenAI can process and interpret human language, allowing them to听generate contextually relevant responses to user prompts.

GenAI can process and听generate text in multiple languages.

GenAI can be fine-tuned for various tasks and domains, making them widely applicable听(e.g., chatbots, content generation, and language translation).听

GenAI can learn patterns and representations from vast amounts of data, enabling them to听capture nuances in language听and generate outputs based on the patterns they've seen during training.

GenAI models can听remember previous interactions, which results in more coherent and relevant conversation experiences for users.

GenAI can听generate responses quickly, allowing for rapid interactions and real-time applications.鈥�

Weaknesses

GenAI can generate information that appears factual but is inaccurate.听

It鈥檚 potentially dangerous that GenAI models appear to understand the content that they use and generate, but in听reality they do not understand it. This could lead users to have misplaced trust in the GenAI output.听

GenAI output imitates or summarises existing content - mostly听without the permission of the Intellectual Property owners -听but can give the appearance of creativity.听

GenAI can produce content听that is morally and ethically troubling, and its use can raise听moral and ethical issues.听

Training and running GenAI models can require significant computational and power resources.

The outputs of GenAI need to be moderated 听to establish 鈥榞uardrails鈥� that prevent it generating inappropriate or offensive outputs. For ChatGPT, this was undertaken by poorly paid workers in Kenya, many of whom suffered mental health issues because of the disturbing generated output that they had witnessed.听

GenAI can be used to automatically generate fake news and deep fakes.听

GenAI is contributing to the digital divide. It relies on huge amounts of data and massive computing power, which is mostly only available to the largest international technology companies and a few economies. This means that the possibility to create and control GenAI is out of reach of most听people, especially those in the Global South.听

While we understand broadly how GenAI works, because of its complexity it is usually impossible to know why it produces particular outputs.听

The output of GenAI is flooding the internet. This poses an interesting recursive risk for future GPT models. These themselves will be trained on online content that earlier GPT models have created (including all its biases and errors).听

GenAI tends to output standard answers that replicate the values of the creators of the data used to train the models. This may constrain the development of plural opinions and further marginalize marginalized voices.听

Return to top听

Further information听

Find a broad range of commentaries and resources to inform your own views.听Note that including a link on this page does not suggest that 乱伦秀 supports or endorses the views expressed.

Universities

Arizona State University (March 2023): 听

Russell Group (July 2023): 听

University of Cambridge (May 2023): 听

University of Leeds (undated): 听

Monash University (updated): 听

University of Sydney (updated): 听

Peter Bryant (University of Sydney Associate Dean Education) (January 2023): 听

Deakin University (March 2023): 听

Imperial College London (March 2023): 听

Academic and related organisations

QAA (January 2023): 听

National Centre for AI in Tertiary Education (JISC) (January 2023): 听

HEPI (May 2023): 听

QAA (May 2023): 听

National Centre for AI (JISC) (May 2023): 听

Sensemaking, AI, and Learning (SAIL) (May 2023): 听

Wolfram (February 2023): 听

National Centre for AI in Tertiary Education (JISC) (March 2023): 听

听

UNESCO (2023): 听

SEDA (March 2023): (recordings)听

QAA (updated): .听

Media

Business Insider (January 2023): 听

TechCrunch (February 2023): 听

Feedback Fruits (December 2022): 听

Insider (August 2023): 听

New York Post (July 2023): 听

BBC (July 2023): 听

Reuters (April 2023): 听

Times Higher Education (July 2023): 听

Wonkhe (June 2023): 听

Wonkhe (July 2023): 听

Bounded Regret (June 2023): 听

FT (May 2023): 听

Educsause Review (April 2023): 听

Inside Higher Ed (April 2023): 听

Wonkhe (April 2023): 听

Rachel Arthur Writes (April 2023): 听

Wonkhe (April 2023): 听

MIT Technology Review (April 2023): 听

Jim Dickinson (Wonkhe) (): 听

OpenAI (undated): 听

Times Higher Education (February 2023): 听

The Chronicle (March 2023): 听

#LTHEchat (March 2023): 听

The Conversation (February 2023): .听

ZDNet (February 2023): .听

Return to top听

乱伦秀