Terrifyingly, Facebook wants its AI to be your eyes and ears
Facebook has announced a research project that aims to push the “frontier of first-person perception”, and in the process help you remember where you left your keys.
The Ego4D project provides a huge collection of first-person video and related data, plus a set of challenges for researchers to teach computers to understand the data and gather useful information from it.
In September, the social media giant launched a line of “smart glasses” called Ray-Ban Stories , which carry a digital camera and other features. Much like the Google Glass project, which met mixed reviews in 2013, this one has prompted complaints of privacy invasion .
The Ego4D project aims to develop software that will make smart glasses far more useful, but may in the process enable far greater breaches of privacy.
What is Ego4D?
Facebook describes the heart of the project as a massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and nine countries, with over 3,025 hours of daily-life activity video.
The “Ego” in Ego4D means egocentric (or “first-person” video), while “4D” stands for the three dimensions of space plus one more: time. In essence, Ego4D seeks to combine photos, video, geographical information and other data to build a model of the user’s world.
There are two components: a large dataset of first-person photos and videos, and a “benchmark suite” consisting of five challenging tasks that can be used to compare different AI models or algorithms with each other. These benchmarks involve analyzing first-person videos to remember past events, create diary entries, understand interactions with objects and people, and forecast future events.
The dataset includes more than 3,000 hours of first-person video from 855 participants going about everyday tasks, captured with a variety of devices including GoPro cameras and augmented reality (AR) glasses . The videos cover activities at home, in the workplace, and hundreds of social settings.
What is in the data set?
Although this is not the first such video dataset to be introduced to the research community, it is 20 times larger than publicly available datasets. It includes video, audio, 3D mesh scans of the environment, eye gaze, stereo, and synchronized multi-camera views of the same event.
Most of the recorded footage is unscripted or “in the wild”. The data is also quite diverse as it was collected from 74 locations across nine countries, and those capturing the data have various backgrounds, ages and genders.
What can we do with it?
Commonly, computer vision models are trained and tested on annotated images and videos for a specific task. Facebook argues that current AI datasets and models represent a third-person or a “spectator” view, resulting in limited visual perception. Understanding first-person video will help design robots that better engage with their surroundings.
Furthermore, Facebook argues egocentric vision can potentially transform how we use virtual and augmented reality devices such as glasses and headsets. If we can develop AI models that understand the world from a first-person viewpoint, just like humans do, VR and AR devices may become as valuable as our smartphones.
Can AI make our lives better?
Facebook has also developed five benchmark challenges as part of the Ego4D project. The challenges aim to build a better understanding of video materials to develop useful AI assistants. The benchmarks focus on understanding first-person perception. The benchmarks are described as follows:
What about privacy?
Obviously, there are significant privacy concerns. If this technology is paired with smart glasses constantly recording and analyzing the environment, the result could be constant tracking and logging (via facial recognition) of people moving around in public.
While the above may sound dramatic, similar technology has already been trialed in China, and the potential dangers have been explored by journalists .
Facebook says it will maintain high ethical and privacy standards for the data gathered for the project, including consent of participants, independent reviews, and de-identifying data where possible.
As such, Facebook says the data was captured in a “controlled environment with informed consent”, and in public spaces “faces and other PII [personally identifying information] are blurred”.
But despite these reassurances (and noting this is only a trial), there are concerns over the future of smart-glasses technology coupled with the power of a social media giant whose intentions have not always been aligned to their users .
The future?
The ImageNet dataset, a huge collection of tagged images, has helped computers learn to analyze and describe images over the past decade or more. Will Ego4D do the same for first-person video?
We may get an idea next year. Facebook has invited the research community to participate in the Ego4D competition in June 2022, and pit their algorithms against the benchmark challenges to see if we can find those keys at last.
Article by Jumana Abu-Khalaf , Research Fellow in Computing and Security, Edith Cowan University and Paul Haskell-Dowland , Associate Dean (Computing and Security), Edith Cowan University
This article is republished from The Conversation under a Creative Commons license. Read the original article .
Google Research: Self-supervised learning is transforming medical imaging
Deep learning shows a lot of promise in healthcare, especially in medical imaging, where it can help improve the speed and accuracy of diagnosing patient conditions. But it also faces a serious barrier: The shortage of labeled training data.
In medical contexts, training data come at great costs, which makes it very difficult to use deep learningfor many applications.
To overcome this hurdle, scientists have explored different solutions to various degrees of success. In a new paper , artificial intelligence researchers at Google suggest a new technique that uses self-supervised learning to train deep learning models for medical imaging. Early results show that the technique can reduce the need for annotated data and improve the performance of deep learning models in medical applications.
Supervised pre-training
Convolutional neural networks (CNN) have proven to be very efficient at computer vision tasks. Google is one of several organizations that has been exploring its use in medical imaging . In recent years, the company’s research arm has built several medical imaging models in domains like ophthalmology, dermatology, mammography and pathology.
“There is a lot of excitement around applying deep learning to health, but it remains challenging because highly accurate and robust DL models are needed in an area like healthcare,” Shekoofeh Azizi, AI resident at Google Research and lead author of the self-supervised paper, told TechTalks.
One of the key challenges of deep learning is the need for huge amounts of annotated data. Large neural networks require millions of labeled examples to reach optimal accuracy. In medical settings, data labeling is a complicated and costly endeavor.
“Acquiring these ‘labels’ in medical settings is challenging for a variety of reasons: it can be time-consuming and expensive for clinical experts, and data must meet relevant privacy requirements before being shared,” Azizi said.
For some conditions, examples are scarce to begin with, and in others, such as breast cancer screening, it may take many years for the clinical outcomes to manifest after a medical image is taken.
Further complicating the data requirements of medical imaging applications are distribution shifts between training data and deployment environments, such as changes in the patient population, disease prevalence or presentation, and the medical technology used for imaging acquisition, Azizi added.
One popular way to address the shortage of medical data is to use supervised pre-training. In this approach, a CNN is initially trained on a dataset of labeled images such as ImageNet. This phase tunes the parameters of the model’s layers to the general patterns found in all kinds of images. The trained deep learning model can then be fine-tuned on a limited set of labeled examples for the target task.
Several studies have shown supervised pre-training to be helpful in applications such as medical imaging, where labeled data is scarce. However, supervised pre-training also has its limits.
“The common paradigm for training medical imaging models is transfer learning where models are first pre-trained using supervised learning on ImageNet. However, there is a large domain shift between natural images in ImageNet and medical images, and previous research has shown such supervised pre-training on ImageNet may not be optimal for developing medical imaging models,” Azizi said.
Self-supervised pre-training
Self-supervised learning has emerged as a promising area of research in recent years. In self-supervised learning, the deep learning models learn the representations of the training data without the need for labels. If done right, self-supervised learning can be of great advantage in domains where labeled data is scarce and unlabeled data is abundant.
Outside of medical settings, Googlehas developed several self-supervised learning techniques to train neural networks for computer vision tasks. Among them is the Simple Framework for Contrastive Learning ( SimCLR ), which was presented at the ICML 2020 conference. Contrastive learning uses different crops and variations of the same image to train a neural network until it learns representations that are robust to changes.
In their new work, the Google Research team used a variation of the SimCLR framework called Multi-Instance Contrastive Learning (MICLe), which learns stronger representations by using multiple images of the same condition. This is often the case in medical datasets, where there are multiple images of the same patient, though the images might not be annotated for supervised learning.
“Unlabeled data is often available in large quantities in various medical domains. One important difference is that we utilize multiple views of the underlying pathology commonly present in medical imaging datasets to construct image pairs for contrastive self-supervised learning,” Azizi said.
When a self-supervised deep learning model is trained on different viewing angles of the same target, it learns more representations that are more robust to changes in viewpoint, imaging conditions, and other factors that might negatively affect its performance.
Putting it all together
The self-supervised learning framework the Google researchers used involved three steps. First, the target neural network was trained on examples from the ImageNet dataset using SimCLR. Next, the model was further trained using MICLe on a medical dataset that has multiple images for each patient. Finally, the model is fine-tuned on a limited dataset of labeled images for the target application.
The researchers tested the framework on two dermatology and chest x-ray interpretation tasks. When compared to supervised pre-training, the self-supervised method provides a significant improvement in the accuracy, label efficiency, and out-of-distribution generalization of medical imaging models, which is especially important for clinical applications. And it requires much less labeled data.
“Using self-supervised learning, we show that we can significantly reduce the need for expensive annotated data to build medical image classification models,” Azizi said. In particular, on the dermatology task, they were able to train the neural networks to match the baseline model performance while using only a fifth of the annotated data.
“This hopefully translates to significant cost and time savings for developing medical AI models. We hope this method will inspire explorations in new healthcare applications where acquiring annotated data has been challenging,” Azizi said.
This article was originally published by Ben Dickson on TechTalks , a publication that examines trends in technology, how they affect the way we live and do business, and the problems they solve. But we also discuss the evil side of technology, the darker implications of new tech, and what we need to look out for. You can read the original article here .
The AI stories that made us smile in 2021 — and 5 that made us cry
AI had a memorable 2021, although not always for the best reasons. The field unleashed an arousing blend of breakthroughs, applications, and ideas — but also discharged a steady stream of bigotry, BS, and big tech barbarity.
At Neural, we aspire to be like Fox News in its prime : fair and balanced. In this equitable spirit, we’ve compiled an even mix of AI’s best and worst of 2021.
Without further ado, here are five stories that made us cherish our robot overlords — and five that had us reaching for the off switch.
10. Bad: Slaughterbots coming to your neighborhood
It was another busy year for AI weaponry. After DARPA tested algorithm-controlled jets in dogfights and a robot dog briefly joined the French military , we received a warning: “slaughterbots” will soon be on our streets — unless the UN bans them.
9. Good: Turning the tone-deaf into rap stars
Computational creativity had a big 2021, offering a mix of inspiration and indignation to human artists. My favorite iteration was an app that turns your text into raps by legendary artists. It’s the closest I’ll ever come to spitting like Biggie Smalls — although Linkin Park’s Mike Shinoda put my own efforts to shame.
8. Bad: Cops running rampant with AI
If there’s a dystopian application of AI available, there’s a strong chance that the police want to try it. Countless cops can already use blackbox AI to conduct unethical surveillance , generate evidence, and swerve constitutional protections — and it’s only going to get worse.
7. Good: Quantum AI could make our planet a paradise
It’s easy to focus on the worst of tech — particularly when you’re as jaded as I am — but there are reasons to be optimistic about the future. One is the potential of quantum AI to fight diseases, war, famine, and aging .
Bad: GPT-3’s bigotry
It wouldn’t be a worst of AI countdown without a mention of biogtry. Unfortunately, this year provided a range of horrors to choose from, from fears of machine-driven segregation to Facebook’s racist AI . I’ve plumped for one GPT-3’s array of prejudices : the model’s “consistent and creative anti-Muslim bias .” This is one example of computational creativity we could do without.
5. Good: Searching for future diseases
During COVID-19, AI has promised much but delivered little. However, researchers have developed a tool that could help us prepare for the next one: an AI- powered system that identifies diseases that could leap from animals to humans .
4. Bad: Dreams of driverless cars dying
Driverless cars were supposed to be dominating the roads by now, but the technical challenges are still proving hard to solve . The dream hasn’t died just yet, but it’s now on life support.
3. Good: Disrupting the ridiculous hearing aid market
The excitement over scientific breakthroughs and futurology can lead us to overlook some of the AI that can make a difference today. There are numerous examples, from BCIs turning paralyzed peoples’ thoughts into speech to this gadget for people with hearing loss.
2. Bad: The Google search algorithm
Sometimes, you need to experience a problem to truly understand it. Neural editor Tristran Greene did just that after discovering that Google News thinks he’s the queerest AI reporter in the world . That title shouldn’t be a source of shame, but the result was that the algorithm pigeonholed some people and overlooked others. Tristan later learned that he could somewhat game the system — but if he could do it, so could nefarious actors.
1. Good: A new approach to AI ethics
The fallout over Timnit Gebru’s firing from Google began last year, but the ramifications rumbled across 2021. The incident sparked concerns about diversity and AI ethics in tech — but it’s also produced positive outcomes. Exactly a year after Gebru lost her job at Google, she announced a new position: founder and executive director of DAIR , a lab that aims to make AI research independent from big tech. It’s an ambitious vision, but it sets a precedent for future ethics institutes.
Here’s hoping that 2022 brings more bold and positive AI developments — before the slaughterbots kill us all.