SCS Researchers Take Aim at Bridging Long-Standing PDF Accessibility Gaps

04/09/2026    Mallory Lindahl

The Breakdown: 

  • iTagPDF automatically adds accessibility metadata to LaTeX-based PDFs.
  • It combines visual layout analysis with original source document information.
  • The system improves accuracy and reduces the need for manual tagging.

* * *

Researchers in the Carnegie Mellon University School of Computer Science have developed a system that automatically improves PDF accessibility.

Although PDFs are used in almost every field, from research to government documents, the majority are inaccessible to people with visual or motor impairments. Screen readers and other accessibility tools rely on digital tags to interpret content order, text and images. Because PDFs are static documents, adding the tags to make them accessible is time-consuming and technically complex manual work, leading to nearly 95 percent of publicly available PDFs remaining inaccessible.

The CMU project, known as iTagPDF, represents a meaningful step toward fully automated PDF accessibility, a goal that has remained elusive despite decades of effort. It also demonstrates how advances in artificial intelligence and computer vision can lead to tools that better recognize document structure, making PDFs easier for assistive technologies to read and navigate.

“Improving PDF accessibility does not just help people with disabilities,” said Jeff Bigham, an associate professor in CMU’s Human-Computer Interaction Institute and part of the research team. “It improves how everyone interacts with information by making documents more structured, searchable and easier for both people and AI systems to understand.”

Screen readers and other assistive technologies rely on structured metadata to interpret documents, but most PDFs lack the necessary tagging, logical reading order and descriptive information to communicate accurately with these technologies. Metadata includes elements like a document’s title, author and keywords, along with structural information that tells assistive tools how to navigate the content. Adding this information has historically been a manual, time-consuming process, resulting in most publicly available PDFs remaining inaccessible.

three people sit at a table

Peya Mowar (left), Aaron Steinfeld (center) and Jeff Bigham (right) created iTagPDF to help improve accessibility for LaTeX-based PDFs.

iTagPDF combines visual analysis with information from original source documents to rethink how accessibility metadata is generated for LaTeX-based PDFs, such as research papers. Rather than relying solely on the final PDF, the system analyzes both visual layout and embedded document semantics to identify elements like headings, paragraphs, figures and tables; determine reading order; and generate content-specific metadata such as alternative text descriptions for images. This approach could eventually extend beyond LaTeX to improve accessibility across a broader range of PDFs.

“PDFs are notoriously hard to make accessible after the fact,” said Aaron Steinfeld, a research professor in the Robotics Institute (RI) and part of the iTagPDF team. “What we’re showing is that if you bring in information from the original document, you can do much better. That principle could extend far beyond LaTeX in the future.”

By preserving information that is often lost in the conversion from source document to PDF, iTagPDF produces outputs that are more usable and accurate for assistive technologies. The system then automatically embeds this metadata back into the PDF, which reduces the load for authors who previously had to do this work by hand.

“Though there are existing tools that explore this type of automation, they still require extensive manual work and can produce low-quality results,” said Peya Mowar, an RI Ph.D. student who led the research effort. “iTagPDF is designed to balance automation with accuracy. In our evaluations using a dataset of research papers paired with their source files, we established a strong performance baseline and, in many cases, surpassed the quality of manual tagging.”

iTagPDF won a Best Paper Award at the Association for Computing Machinery Conference on Human Factors in Computing Systems (CHI 2026). Mowar will present the work at the conference, where she is also serving as a student volunteer on the CHI accessibility team.

To learn more about the project, read the team’s paper.

For More Information: Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu