This week: Gorillaz - a virtual band in VR, deep neural networks (DNN) looking for sound effects and actual humans translating video titles and descriptions.Continue Reading →The post Gorillaz’s Humanz, DNN Sound Effects and Human Translations appeared first on FIR Podcast Network.
[powereditor]
This week we are discussing A virtual band in VR, deep neural networks (DNN) looking for sound effects and actual humans translating language.
Fake Musicians, Smart Machines and Human Intelligence
Visual Sound Effects
Noah Wang a software engineer at Google in a March 23, 2o17 blog post entitled Visualizing Sound Effects announced a working system for using AI to identify sound effects like “applause, laughter and bells.”
Once identified these “effects” are visualized by adding them to the auto-generated captions available for videos on YouTube.
“So what does this actually look like when you are watching a YouTube video? The sound effect is merged with the automatic speech recognition track and shown as part of standard automatic captions…
Click the CC button to see the sound effect captioning system in action.”
On the same day, Sourish Chaudhuri, who is also a Software Engineer at Google released a related post, “Adding Sound Effect Information to YouTube Captions” on the Google Research Blog. Sourish describes the process, “The DNN looks at short segments of audio and predicts whether that segment contains any one of the sound events of interest – since multiple sound effects can co-occur, our model makes a prediction at each time step for each of the sound effects.”
Getting It Wrong: Legal, Ethical and Moral Considerations
Since those with disabilities are required by law to given the same material/information and the potential for ML (Machine Learning) systems to introduce errors, the team spent some effort with a study that investigated what would happen when the DNN ML system get’s it wrong.
“This presented a surprising result: when sound effect information was incorrect, it did not detract from the participant’s experience in roughly 50% of the cases. Based upon participant feedback, the reasons for this appear to be:
* Participants who could hear the audio were able to ignore the inaccuracies.
* Participants who could not hear the audio interpreted the error as the presence of a sound event, and that they had not missed out on critical speech information.”
Powering Translation with Human Intelligence
Google has long allowed for members of the YouTube community to provide translations for the captions that are found in their videos. In a March 30th, 2017 blog post Aviad Rozenhek a YouTube Product Manager announced that community members would now be able to also translate the title and the descriptions of videos.
While this does represent a possible risk for brands who enable community translations in their videos this is also an amazing way to engage with a broad multi linguistic community.
Gorillaz New Video
The British virtual band Gorillaz released a new music video from their upcoming album Humanz. The video released in both 2D and VR (360) formats gives creators a side-by-side look at the creative decisions and creative options that are possible in a video crafted for VR.
I highly recommend communicators watch both of these videos. Warning: while pixelated cartoon nudity is contained within, I do feel these are both safe for work. Together these two variants of the same video have recieved in one week over 25 million views.
VR/360 Video
Traditional 2D video
It is worth nothing that this virtual band has for a number of years been actually playing out in RL (real life).
Details on
[powereditor]
This week we are discussing A virtual band in VR, deep neural networks (DNN) looking for sound effects and actual humans translating language.
Fake Musicians, Smart Machines and Human Intelligence
Noah Wang a software engineer at Google in a March 23, 2o17 blog post entitled Visualizing Sound Effects announced a working system for using AI to identify sound effects like “applause, laughter and bells.”
Once identified these “effects” are visualized by adding them to the auto-generated captions available for videos on YouTube.
“So what does this actually look like when you are watching a YouTube video? The sound effect is merged with the automatic speech recognition track and shown as part of standard automatic captions…
Click the CC button to see the sound effect captioning system in action.”
On the same day, Sourish Chaudhuri, who is also a Software Engineer at Google released a related post, “Adding Sound Effect Information to YouTube Captions” on the Google Research Blog. Sourish describes the process, “The DNN looks at short segments of audio and predicts whether that segment contains any one of the sound events of interest – since multiple sound effects can co-occur, our model makes a prediction at each time step for each of the sound effects.”
“(Left) The dense sequence of probabilities from our DNN for the occurrence over time of single sound category in a video. (Center) Binarized segments based on the modified Viterbi algorithm. (Right) The duration-based filter removes segments that are shorter in duration than desired for the class.”- Google
Since those with disabilities are required by law to given the same material/information and the potential for ML (Machine Learning) systems to introduce errors, the team spent some effort with a study that investigated what would happen when the DNN ML system get’s it wrong.
“This presented a surprising result: when sound effect information was incorrect, it did not detract from the participant’s experience in roughly 50% of the cases. Based upon participant feedback, the reasons for this appear to be:
Google has long allowed for members of the YouTube community to provide translations for the captions that are found in their videos. In a March 30th, 2017 blog post Aviad Rozenhek a YouTube Product Manager announced that community members would now be able to also translate the title and the descriptions of videos.
While this does represent a possible risk for brands who enable community translations in their videos this is also an amazing way to engage with a broad multi linguistic community.
The British virtual band Gorillaz released a new music video from their upcoming album Humanz. The video released in both 2D and VR (360) formats gives creators a side-by-side look at the creative decisions and creative options that are possible in a video crafted for VR.
I highly recommend communicators watch both of these videos. Warning: while pixelated cartoon nudity is contained within, I do feel these are both safe for work. Together these two variants of the same video have recieved in one week over 25 million views.
VR/360 Video
Traditional 2D video
It is worth nothing that this virtual band has for a number of years been actually playing out in RL (real life).
Details on Wikipedia.
[powereditor]
The post Gorillaz’s Humanz, DNN Sound Effects and Human Translations appeared first on FIR Podcast Network.