Artificial intelligence can create images based on text prompts, but scientists unveiled a gallery of pictures the technology produces by reading brain activity.
According to the Daily Mail, The new AI-powered algorithm reconstructed around 1,000 images, including a teddy bear and an airplane, from these brain scans with 80 percent accuracy.
Researchers from Osaka University used the popular Stable Diffusion model, included in OpenAI’s DALL-E 2, which can create any imagery based on text inputs.
The team showed participants individual sets of images and collected fMRI (functional magnetic resonance imaging) scans, which the AI then decoded.
‘We show that our method can reconstruct high-resolution images with high semantic fidelity from human brain activity,’ the team shared in the study published in bioRxiv. ‘Unlike previous studies of image reconstruction, our method does not require training or fine-tuning of complex deep-learning models.’
The algorithm pulls information from parts of the brain involved in image perception, such as the occipital and temporal lobes, according to Yu Takagi, who led the research. The team used fMRI because it picks up blood flow changes in active brain areas, Science.org reports.
FMRI can detect oxygen molecules, so the scanners can see where in the brain our neurons — brain nerve cells — work hardest (and draw most oxygen) while we have thoughts or emotions.
A total of four participants were used in this study, each viewing a set of 10,000 images. The AI starts generating the images as noise similar to television static, which is then replaced with distinguishable features the algorithm sees in the activity by referring to the pictures it was trained on and finding a match.
‘We demonstrate that our simple framework can reconstruct high-resolution (512 x 512) images from brain activity with high semantic fidelity,’ according to the study.
‘We quantitatively interpret each component of an LDM from a neuroscience perspective by mapping specific components to distinct brain regions.
We present an objective interpretation of how the text-to-image conversion process implemented by an LDM [a latent diffusion model] incorporates the semantic information expressed by the conditional text while at the same time maintaining the appearance of the original image.’