The Resource Finding relevant PDF medical journal articles by the content of their figures as well as their text, by Ammon Christiansen

Finding relevant PDF medical journal articles by the content of their figures as well as their text, by Ammon Christiansen

Label
Finding relevant PDF medical journal articles by the content of their figures as well as their text
Title
Finding relevant PDF medical journal articles by the content of their figures as well as their text
Statement of responsibility
by Ammon Christiansen
Creator
Subject
Genre
Language
eng
Summary
  • This work addresses the need for an alternative to keyword-based search for sifting through large PDF medical journal article document collections for literature review purposes. Despite users' best efforts to form precise and accurate queries, it is often difficult to guess the right keywords to find all the related articles while finding a minimum number of unrelated ones. Failure during literature review to find relevant, related research results in wasted research time and effort in addition to missing significant work in the related area which could affect the quality of the research work being conducted
  • The purpose of this work is to explore the benefits of a retrieval system for professional journal articles in PDF format that supports hybrid queries composed of both text and images. PDF medical journal articles contain formatting and layout information that imply the structure and organization of the document. They also contain figures and tables rich with content and meaning. Stripping a PDF into ``full-text'' for indexing purposes disregards these important features
  • Specifically, this work investigated the following: (1) what effect the incorporation of a document's embedded figures into the query (in addition to its text) has on retrieval performance (precision) compared to plain keyword-based search; (2) how current text-based document-query similarity methods can be enhanced by using formatting and font-size information as a structure and organization model for a PDF document; (3) whether to use the standard Euclidean distance function or the matrix distance function for content-based image retrieval; (4) how to convert a PDF into a structured, formatted, reflowable XML representation given a pure-layout PDF document; (5) what document views (such as a term frequency cloud, a document outline, or a document's figures) would help users wade through search results to quickly select those that are worth a closer look
  • While the results of the experiments were unexpectedly worse than their baselines of comparison (see the conclusion for a summary), the experimental methods are very valuable in showing others what directions have already been pursued and why they did not work and what remaining problems need to be solved in order to achieve the goal of improving literature review through use of a hybrid text and image retrieval system
Cataloging source
UPB
Degree
M.S.
Dissertation year
2007
Granting institution
Brigham Young University. Dept. of Electrical and Computer Engineering
Illustrations
illustrations
Index
no index present
Literary form
non fiction
Nature of contents
  • bibliography
  • theses
Label
Finding relevant PDF medical journal articles by the content of their figures as well as their text, by Ammon Christiansen
Link
http://hdl.lib.byu.edu/1877/etd1815
Instantiates
Publication
Bibliography note
Includes bibliographical references (p. 95-98)
Carrier category
volume
Carrier MARC source
rdacarrier
Content category
text
Content type MARC source
rdacontent
Dimensions
28 cm.
Extent
xxii, 98 p.
Media category
unmediated
Media MARC source
rdamedia
Other physical details
ill. (some col.)
System control number
UtOrBLW
Label
Finding relevant PDF medical journal articles by the content of their figures as well as their text, by Ammon Christiansen
Link
http://hdl.lib.byu.edu/1877/etd1815
Publication
Bibliography note
Includes bibliographical references (p. 95-98)
Carrier category
volume
Carrier MARC source
rdacarrier
Content category
text
Content type MARC source
rdacontent
Dimensions
28 cm.
Extent
xxii, 98 p.
Media category
unmediated
Media MARC source
rdamedia
Other physical details
ill. (some col.)
System control number
UtOrBLW

Library Locations

Processing Feedback ...