Published May 29, 2014 | Version 1
Project deliverable Open

TWO!EARS Deliverable D3.1 - Software Architecture (WP3: Feature extraction, object formation & meaning assignment; FP7-ICT-2013-C TWO!EARS FET-Open Project 618075)

  • 1. Speech and Hearing Research Group, University of Sheffield, UK
  • 2. Hearing Systems Group, Technical University of Denmark, Copenhagen, Denmark
  • 3. Institute of Communication Acoustics, Ruhr-University Bochum, Germany
  • 4. Neural Information Processing Group, Technische Universität Berlin, Germany
  • 5. Communication Acoustics and Aural Architecture Research Laboratory, Rensselaer Polytechnic Institute, Troy, USA
  • 6. Multimodal Active Perception Group, Université Pierre et Marie Curie, Paris, France

Description

The goal of the Two!Ears project is to develop an intelligent, active computational model of auditory perception and experience in a multi-modal context. At the heart of the project is a software architecture that optimally fuses prior knowledge with the currently available sensor input, in order to find the best explanation of all available information. Top-down feedback plays a crucial role in this process. The software architecture will be implemented on a mobile robot endowed with a binaural head and stereo cameras, allowing for active exploration and understanding of audiovisual scenes.

This deliverable sets out the design of the software architecture, with an emphasis on communication between the components of the system. An object-oriented approach is used throughout, giving benefits of reusability, encapsulation and extensibility.

The first stage of the system architecture concerns bottom-up auditory signal processing, which transforms the signals arriving at the binaural head into auditory cues. Bottomup signal processing is implemented as a collection of processor modules, which are instantiated and routed by a manager object. This affords great flexibility, and allows real-time modification of bottom-up processing in response to feedback from higher levels of the system. Processor modules are provided to compute cues such as rate maps, interaural time and level differences, interaural coherence, onsets and offsets.

Bottom-up cues are provided as input to a blackboard system, which consists of a collection of independent knowledge sources (KS) that communicate by reading and writing data on a globally-accessible data structure (the blackboard). The blackboard is divided into layers, which describe hypotheses at different levels of abstraction. Our blackboard system uses an event-driven design for efficiency; when data is placed on the blackboard, an event is broadcast which can be responded to by a KS with a matching precondition.

Graphical models play an important role in the blackboard. KS are able to access a general graphical model maintained by the blackboard, and can perform inference on certain nodes in order to generate new evidence. A KS can also contain its own graphical model.

As a proof of concept, a specific instantiation of the software architecture is described which localises and identifies a single sound source. It is shown that top-down feedback in the system plays a crucial role when front/back confusions occur, prompting head movements that allow the confusions to be resolved.

It should be noted that the current document includes deliverable D2.1 in order to give a complete overview of the Two!Ears software architecture. This document should therefore be considered the most comprehensive account of the software architecture as of 31st May 2014.

Files

D31_full_software_architecture.pdf

Files (2.2 MB)

Name Size Download all
md5:0ea2589c494a415fe45077c9420ecaee
2.2 MB Preview Download

Additional details

Funding

TWO!EARS – TWO!EARS 618075
European Commission