Committee Members

Full Committee

Full committee members were aware of the 2021 NeurIPS challenge secret tasks. Neither they nor their teammates were allowed to submit to the challenge.

Joseph Turian has over 6000 scientific citations for his refereed work. He recently received the Association of Computational Linguistics 10 Year Test of Time Award as lead author on his work on word representations, which ACL described as “exceptionally thorough, meticulous” and “half a decade ahead of its time”. He did postdoctoral research with Yoshua Bengio.

Jordie Shier is currently pursuing a masters degree in Computer Science and Music at the University of Victoria, Victoria, Canada, supervised by Dr. George Tzanetakis and Professor Kirk McNally. His masters thesis research is focused on the development of intelligent music production tools using techniques from the field of music information retrieval. His research is funded by a Canadian NSERC fellowship.

Bhiksha Raj is a Professor in the Language Technologies Institute of the School of Computer Science at Carnegie Mellon University. He also holds affiliate positions in the Electrical and Computer Engineering, Machine Learning and Music Technology departments. Prior to joining Carnegie Mellon in 2008, Dr. Raj was at Mitsubishi Electric Research Labs, and a part-time instructor at Harvard University’s Extension School. Dr. Raj’s research interests span Speech, Audio and Music processing, Machine Learning, and Deep Learning, and has authored over 300 papers, patents, and multiple edited books on these topics. In his most recent work, Dr. Raj has pioneered the area of learning audio representations from weakly labelled data. Dr. Raj is a fellow of the IEEE.

Björn W. Schuller is Full Professor of Artificial Intelligence and the Head of GLAM – the Group of Language, Audio, & Music at Imperial College London/UK, Full Professor at the University of Augsburg/Germany, co-founding CEO and current CSO of audEERING – an Audio Intelligence company, amongst other Professorships and Affiliations. He is a Fellow of the IEEE and Golden Core Awardee of the IEEE Computer Society, Fellow of the BCS, Fellow of the ISCA, and President-Emeritus of the AAAC. He (co-)authored 1,000+ publications (36k+ citations, h-index=87), and is lead organiser of the Interspeech ComParE and ACM Multimedia AVEC challenge series with overall >30 research competitive challenges organised.

Christian James Steinmetz is a PhD student working with Joshua Reiss within the Centre for Digital Music at Queen Mary University of London, where he researching applications of machine learning for audio signal processing with a focus on high fidelity audio and music production. Previously, he was an intern at Facebook Reality Labs, Dolby Labs, Bose and Cirrus Logic.

Colin Malloy is an award-winning percussionist and composer specializing in contemporary solo and chamber percussion, the steel pan, and music technology. He is currently pursuing an interdisciplinary PhD in music and computer science at University of Victoria in British Columbia.

George Tzanetakis is a Professor in the Department of Computer Science with cross-listed appointments in Electrical and Computer Engineering and Music at the University of Victoria, Victoria, Canada. He is the Canada Research Chair (Tier II) in the Computer Analysis of Audio and Music and received the Craigdarroch Research Award in Artistic Expression at the University of Victoria, Victoria, Canada, in 2012. His research spans all stages of audio content analysis, such as feature extraction, segmentation, and classification, with specific emphasis on music information retrieval.

Gissel Velarde, PhD in computer science and engineering, is an award-winning researcher, consultant and lecturer specialized in Artificial Intelligence. She was a research member in the European Commission’s project: Learning to Create, was a lecturer at Aalborg University, and is a lecturer at Universidad Privada Boliviana. She worked for SONY Computer Science Laboratories and Moodagent, among others. Her doctoral thesis presents Convolutional Methods for Music Analysis. She developed machine learning and deep learning algorithms for classification, structural analysis, pattern discovery, and recommendation systems. In 2020, she was named “Notable Women” by the diversity promotion committee of the International Society for Music Information Retrieval.

Kirk McNally is Assistant Professor of Music Technology in the School of Music at the University of Victoria, Victoria, Canada. He is the program administrator for the undergraduate combined major program in music and computer science and the graduate program in music technology. His research and creative work has been supported by the Deutscher Akademischer Austausch Dienst (DAAD), the Canada Council for the Arts, the Banff Centre for Arts and Creativity, and the Social Sciences Humanities Research Council of Canada (SSHRC).

Max Henry is a graduate researcher in Music Technology at McGill University, where he is a member of the Music Perception and Cognition Lab. After pursuing an undergraduate degree in jazz piano, he spent a decade playing synthesizer with the electronic rock band SUUNS. He has worked as an audio engineer, producer, and composer for television and film. He begins a graduate degree in Computer and Electrical Engineering at McGill in fall 2021.

Nicolas Pinto got his Neuroscience PhD from MIT in 2010 where he was the first to use GPUs for Neural Networks. He also taught GPU Programming to Computational Scientists at Harvard and MIT where he did his postdoc until 2012. He then founded Perceptio in 2014, the first privacy-preserving mobile Deep Learning startup. Apple acquired his startup in 2014, which formed the genesis of all Deep Learning technologies deployed on Apple’s current product line. Nicolas left Apple in early 2018 to focus on the intersection of Blockchain and AI for security, privacy and scalability of these systems. To this end, he recently co-founded Cygni Labs.

Yonatan Bisk is an Assistant Professor at CMU’s Language Technologies Institute. His research focuses include multi-modal learning, and connecting language to perception and control. He has co-organized well attended workshops at 2xNeurIPS, 2xNAACL, ECCV, and ACL.

Associate Committee

Associate committee members did not participate in experimental design, but assisted in technical aspects of the competition. They were aware of the secret tasks and neither they nor their teammates were allowed to submit to the NeurIPS challenge.

Gyanendra Das is a promising young researcher who is a sophomore at IIT(ISM) in Dhanbad, India. He is ranked 1% Globally in Kaggle competitions, and received 1st place in the Samsung Innovation Award 2020.

Humair Raj Khan completed his undergrad from IIT Roorkee and has experience of working as a graduate researcher as well a Machine learning engineer. He is passionate about solving problems with machine learning and is continuously exploring his interests over multiple domains.

Steering Committee

Steering committee members were not aware of the 2021 NeurIPS challenge secret tasks. They were not allowed to submit to the challenge, but their teammates were.

Camille Noufi is a vocalist, research engineer, and current PhD candidate in the Center for Computer Research in Music and Acoustics (CCRMA) at Stanford University. Her research utilizes signal processing, machine learning and human-computer-interaction techniques to study the nuances of vocal expression, production and perception. In 2020, she was a Research Intern with the Audio Team at Facebook Reality Labs.

Dorien Herremans is an Assistant Professor at Singapore University of Technology and Design, where she is also Director of Game Lab. Dorien had a joint-appointment at the Institute of High Performance Computing, A*STAR from 2017-2020 and worked as a certified instructor for the NVIDIA Deep Learning Institute. Before being at SUTD, she was a Marie Sklodowska-Curie Postdoctoral Fellow at the Centre for Digital Music at Queen Mary University of London. Dr. Herremans’ research interests include machine learning and music for automatic music generation, data mining for music classification (hit prediction) and novel applications at the intersection of machine learning/optimization and music.

Eduardo Fonseca is a PhD candidate at the Music Technology Group of Universitat Pompeu Fabra, under the supervision of Dr. Xavier Serra. His research focuses on audio dataset creation and learning algorithms for sound event recognition, including learning with noisy labels and self-supervision. His work received one of the best paper awards at WASPAA21. He has interned at Google Research twice and has been actively involved in the DCASE community as Challenge Task organizer and Technical Program Co-Chair.

Jesse Engel is lead research scientist on Magenta, a research team within Google Brain exploring the role of machine learning in creative applications. He did his Bachelors, and Ph.D., at UC Berkeley, studying the martian atmosphere and quantum dot nanoelectronics respectively, and a joint postdoc at Berkeley and Stanford on neuromorphic computing. Afterward, he worked with Andrew Ng to help found the Baidu Silicon Valley AI Lab and was a key contributor to DeepSpeech 2, a speech recognition system named one of the ‘Top 10 Breakthroughs of 2016’ by MIT Technology Review. He joined Google Brain in 2016, where he his research on Magenta includes creating new generative models for audio (DDSP, NSynth), symbolic music (MusicVAE, GrooVAE), adapting to user preferences (Latent Constraints, MIDI-Me), and work to close the gap between research and musical applications (NSynth Super, Magenta Studio).

Justin Salamon is a senior research scientist and member of the Audio Research Group at Adobe Research in San Francisco. Previously he was a senior research scientist at the Music and Audio Research Laboratory and Center for Urban Science and Progress of New York University. His research focuses on the application of machine learning and signal processing to audio & video, with applications in machine listening, audiovisual and multi-modal understanding, representation learning & self-supervision, audio for video, music information retrieval, bioacoustics, environmental sound analysis and open source software & data.

Philippe Esling is an associate professor and researcher in machine learning and artificial intelligence applied to music at IRCAM, where he is head of the ACIDS research group. He teaches computer science and mathematics at Sorbonne Université (Formerly UPMC - Paris 6) and machine learning in the ATIAM Masters. He also participates in ecological monitoring and metagenetics research with Geneva university (UNIGE).

Pranay Manocha is a PhD student in Computer Science at Princeton University, where he is a member of the machine learning group supervised by Dr. Adam Finkelstein. His PhD research has mainly focused on relating audio perception and machine learning, to that end designing perceptual objective metrics for evaluating audio quality. His work was one of the best paper finalists at Interspeech 2020, and in the past, he has been a recipient of SN BOSE fellowship and the Indian Academy of Sciences summer research fellowship.

Shinji Watanabe is an Associate Professor at Carnegie Mellon University, Pittsburgh, PA. He received his B.S., M.S., and Ph.D. (Dr. Eng.) degrees from Waseda University, Tokyo, Japan. He was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan, from 2001 to 2011, a visiting scholar in Georgia Institute of Technology in 2009, a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL) from 2012 to 2017, and an associate research professor at Johns Hopkins University from 2017 to 2020. He has published more than 200 papers in peer-reviewed journals and conferences and received several awards, including the best paper award from the IEEE ASRU in 2019. He served as an Associate Editor of the IEEE Transactions on Audio Speech and Language Processing. He was/has been a member of several technical committees, including the APSIPA Speech, Language, and Audio Technical Committee (SLA), IEEE Signal Processing Society Speech and Language Technical Committee (SLTC), and Machine Learning for Signal Processing Technical Committee (MLSP).

Zeyu Jin is a research scientist at Adobe Research in San Francisco. His research interests are at speech and music synthesis, deep learning, and human-computer interaction. He received a Ph.D. degree in computer science from Princeton University adviced by Adam Finkelstein and M.S in music technology in Carnegie Mellon University. Between 2015 and 2017, he interned at Adobe for three times and presented his primary research project–VoCo–at Adobe MAX Sneaks in 2016.