Logo des Repositoriums
Zur Startseite
  • English
  • Deutsch
Anmelden
  1. Startseite
  2. SuUB
  3. Dissertationen
  4. Analysis and Modeling of Visual Invariance for Object Recognition and Spatial Cognition
 
Zitierlink URN
https://nbn-resolving.de/urn:nbn:de:gbv:46-00104531-10

Analysis and Modeling of Visual Invariance for Object Recognition and Spatial Cognition

Veröffentlichungsdatum
2015-05-29
Autoren
Eberhardt, Sven  
Betreuer
Schill, Kerstin  
Gutachter
Fahle, Manfred  
Zusammenfassung
The human visual system is unmatched by machine imitates in its universal ability to perform a great number of complex tasks such as object detection, tracking and categorization, scene perception and localization seemingly effortlessly and instantly. It can quickly adapt to novel problems, learn concepts from few samples, build and reason on abstract representations and merge information from multiple senses. Of particular interest is processing in the ventral stream of human visual cortex because it solves a multitude of complex scene analysis tasks in temporal ranges of below 200 milliseconds. Here, a functional, dataset-driven analysis approach is followed. Feature outputs from several specialized vision models including \Textons, Gist, HMax, SIFT and Spatial Pyramids are analyzed for their diagnosticity on a number of tasks typically attributed to human ventral stream processing. A strong performance dissociation between models and tasks dependent on invariance properties is found. From these findings, a conceptual space is proposed into which both vision models and associated task requirements are placed based on local and global invariance dimensions. Following this concept, a general-purpose, hierarchical vision model is suggested in which specializations is realized as tuning of receptive field ranges and the proper task-dependent weights. As an example for an application of this conceptual space, the special task of vision-based localization is introduced in a classification concept. Place categorization in several contexts including indoor, outdoor and virtual world environments are sorted into the conceptual space of vision requirements. From this, a universal descriptor called \emph{Signature of a Place} is introduced which outperforms baseline models on all localization tasks. Correlation to human performance is tested, yielding an orthogonal result. The question of self-organized learning in hierarchical systems is analyzed and a novel approach utilizing cross-modal feature training between visual and auditory cues in a deep learning hierarchy is presented. The model is able to generate audio predictions from video input and explain previous human psychophysics results on multi-modal difference thresholds.
Schlagwörter
dissertation

; 

vision

; 

visual system

; 

localization

; 

spatial cognition

; 

object recognition

; 

deep learning

; 

modeling

; 

image processing
Institution
Universität Bremen  
Fachbereich
Fachbereich 03: Mathematik/Informatik (FB 03)  
Dokumenttyp
Dissertation
Zweitveröffentlichung
Nein
Sprache
Englisch
Dateien
Lade...
Vorschaubild
Name

00104531-1.pdf

Size

33.37 MB

Format

Adobe PDF

Checksum

(MD5):38f712180badce89be94266accfa7899

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Datenschutzbestimmungen
  • Endnutzervereinbarung
  • Feedback schicken