Analysis and Modeling of Visual Invariance for Object Recognition and Spatial Cognition
Datei | Beschreibung | Größe | Format | |
---|---|---|---|---|
00104531-1.pdf | 34.18 MB | Adobe PDF | Anzeigen |
Sonstige Titel: | Analyse und Modellierung visueller Invarianz zur Objekterkennung und Raumkognition | Autor/Autorin: | Eberhardt, Sven | BetreuerIn: | Schill, Kerstin | 1. GutachterIn: | Schill, Kerstin | Weitere Gutachter:innen: | Fahle, Manfred | Zusammenfassung: | The human visual system is unmatched by machine imitates in its universal ability to perform a great number of complex tasks such as object detection, tracking and categorization, scene perception and localization seemingly effortlessly and instantly. It can quickly adapt to novel problems, learn concepts from few samples, build and reason on abstract representations and merge information from multiple senses. Of particular interest is processing in the ventral stream of human visual cortex because it solves a multitude of complex scene analysis tasks in temporal ranges of below 200 milliseconds. Here, a functional, dataset-driven analysis approach is followed. Feature outputs from several specialized vision models including \Textons, Gist, HMax, SIFT and Spatial Pyramids are analyzed for their diagnosticity on a number of tasks typically attributed to human ventral stream processing. A strong performance dissociation between models and tasks dependent on invariance properties is found. From these findings, a conceptual space is proposed into which both vision models and associated task requirements are placed based on local and global invariance dimensions. Following this concept, a general-purpose, hierarchical vision model is suggested in which specializations is realized as tuning of receptive field ranges and the proper task-dependent weights. As an example for an application of this conceptual space, the special task of vision-based localization is introduced in a classification concept. Place categorization in several contexts including indoor, outdoor and virtual world environments are sorted into the conceptual space of vision requirements. From this, a universal descriptor called \emph{Signature of a Place} is introduced which outperforms baseline models on all localization tasks. Correlation to human performance is tested, yielding an orthogonal result. The question of self-organized learning in hierarchical systems is analyzed and a novel approach utilizing cross-modal feature training between visual and auditory cues in a deep learning hierarchy is presented. The model is able to generate audio predictions from video input and explain previous human psychophysics results on multi-modal difference thresholds. |
Schlagwort: | dissertation; vision; visual system; localization; spatial cognition; object recognition; deep learning; modeling; image processing | Veröffentlichungsdatum: | 29-Mai-2015 | Dokumenttyp: | Dissertation | Zweitveröffentlichung: | no | URN: | urn:nbn:de:gbv:46-00104531-10 | Institution: | Universität Bremen | Fachbereich: | Fachbereich 03: Mathematik/Informatik (FB 03) |
Enthalten in den Sammlungen: | Dissertationen |
Seitenansichten
374
checked on 25.12.2024
Download(s)
86
checked on 25.12.2024
Google ScholarTM
Prüfe
Alle Ressourcen in diesem Repository sind urheberrechtlich geschützt.