Logo des Repositoriums
Zur Startseite
  • English
  • Deutsch
Anmelden
  1. Startseite
  2. SuUB
  3. Forschungsdokumente
  4. OCR Report
 
Zitierlink DOI
10.26092/elib/1517

OCR Report

Veröffentlichungsdatum
2021-02
Autoren
Skitalinskaya, Gabriella  
Düpont, Nils  
Zusammenfassung
Many social science researchers face the challenge of dealing with textual data that is only available on actual paper or ill-scanned PDF files, and require knowledge of image processing techniques and optical character recognition (OCR) software to obtain satisfactory results to enable further automated text post-processing. Based on sample scans of researches at the Collaborative Research Center “Global Dynamics of Social Policy” (SFB 1342), we compare the results of several open-source and commercial tools available for OCR. We evaluate each tool’s performance across three tasks, namely extracting plain text, recognizing the text style and its structure (hOCR), and extracting tables focusing not only the ability to accurately retrieve data from each cell but also the ability to properly capture the table layout. In this report, we summarize our findings and give recommendations for consideration when planning OCR projects.
Schlagwörter
optical character recognition

; 

computational social sciences

; 

software tools
Institution
Universität Bremen  
Fachbereich
Zentrale Wissenschaftliche Einrichtungen und Kooperationen  
Institute
SFB Globale Entwicklungsdynamiken von Sozialpolitik (SFB 1342)  
Dokumenttyp
Bericht, Report
Serie(s)
Wesis - technical papers  
Band
7
Zweitveröffentlichung
Nein
Lizenz
https://creativecommons.org/licenses/by-nc-nd/4.0/
Sprache
Englisch
Dateien
Lade...
Vorschaubild
Name

WeSIS_Technical_Papers_No 07_Skitalinskaya et al.pdf

Size

1.59 MB

Format

Adobe PDF

Checksum

(MD5):87a7ac9c5fb28f92dbb8c7d9114ff68c

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Datenschutzbestimmungen
  • Endnutzervereinbarung
  • Feedback schicken