A Large-Scale
Tag-Pair-Based Audio Dataset

30.8 thousand

audio files

385.5 hours

of labelled audio

1116 classes

758 Adjective-Noun and 358 Verb-Noun Pairs

Collecting and Processing the AudioPairBank

We looked into existing ontologies containing adjectives, nouns and verbs to collect a list of 10,829 pairs for adjective-nouns and 9,996 pairs for verb-nouns. An ontology by Davies [32] defined three audio semantic levels: sound sources, sound modifiers and soundscape modifiers based on a research where participants were asked to describe sounds based on nouns, verbs and adjectives. Additionally, Axelsson [33] suggested a list of adjectives to describe the feeling that sounds produce to individuals. Another pair of ontologies introduced by Schafer in [21] and Gygi [22] are based on soundscapes and environmental sounds, where sounds were labeled by heir generating source using verbs, such as Baby crying and Cat meowing. Lastly, we considered the Visual Sentiment Ontology (VSO) presented in [24], which is a collection of ANPs based on 24 emotions defined in Plutchiks Wheel of Emotions. However, we discarded the pairs that did not apply to audio such as fresh food or favorite book. After an inspection of the final list of pairs we noticed lexical variations that were grouped. You can explore the resulting folksonomy using following link.We looked into existing ontologies containing adjectives, nouns and verbs to collect a list of 10,829 pairs for adjective-nouns and 9,996 pairs for verb-nouns. An ontology by Davies [32] defined three audio semantic levels: sound sources, sound modifiers and soundscape modifiers based on a research where participants were asked to describe sounds based on nouns, verbs and adjectives. Additionally, Axelsson [33] suggested a list of adjectives to describe the feeling that sounds produce to individuals. Another pair of ontologies introduced by Schafer in [21] and Gygi [22] are based on soundscapes and environmental sounds, where sounds were labeled by heir generating source using verbs, such as Baby crying and Cat meowing. Lastly, we considered the Visual Sentiment Ontology (VSO) presented in [24], which is a collection of ANPs based on 24 emotions defined in Plutchiks Wheel of Emotions. However, we discarded the pairs that did not apply to audio such as fresh food or favorite book. After an inspection of the final list of pairs we noticed lexical variations that were grouped. You can explore the resulting folksonomy using following link.

How to get the AudioPairBank

At the current time the dataset is partially accessible using our browser. Please check download page for additional information.