The Scholastic Commentaries and Texts Archive (SCTA) is a collaborative open-access infrastructure for the creation, curation, publication, and long-term preservation of scholarly textual data related to medieval and early modern intellectual traditions.
The SCTA’s data management practices are designed to ensure transparency, reproducibility, long-term preservation, and continuous scholarly improvement through open collaborative workflows and community stewardship.
The SCTA primarily works with public domain texts and scholarly transcriptions of historical sources. New data is produced through:
All textual data and metadata are maintained in open, machine-readable formats and are continuously improved through community contributions and scholarly review.
Primary textual data is encoded in XML following the TEI compliant LombardPress Transcription Guidelines designed for encoding scholarly scholastic editions.
SCTA textual data is accompanied by open and publically accessible metadata files recording necessary metadata (e.g. authors, manuscript witnesses, etc.)
Documentation for data models and encoding conventions can be found on the SCTA Documentation Page
Because the SCTA assigns identifiers at all levels of the textual hierarchy (including structural divisions, paragraphs, quotations, and individual words across all witnesses) the resulting corpus can be accessed and interconnected with extremely granular precision.
In addition, Git-based version control ensures that every historical state of the data is permanently addressable through unique content hashes and commit histories.
This combination of persistent SCTA identifiers and immutable Git versioning provides strong support for reproducible scholarship, citation stability, and long-term interoperability.
The SCTA employs automated and community-based quality control processes.
All data is maintained within publicly accessible Git repositories and undergoes version-controlled review workflows. Changes are submitted through pull requests and validated using continuous integration pipelines implemented with GitHub Actions.
Validation procedures include:
These workflows ensure that malformed or inconsistent data cannot be merged into production repositories without review and validation.
Because all revisions are preserved in Git history, the provenance and evolution of every change remains transparent and reproducible.
At specific milestones, editorial work is peer reviewed in collaboration with the Medieval Academy of America. Peer reviewed versions of any text are tagged and verifiable using signed certificates. See our peer review policy and 2019 “Decoupling Quality Control and Publication: The Digital Latin Library and the Traveling Imprimatur,” Co-authored with Sam Huskey, Digital Humanities Quarterly, 13:4 (2019).
All SCTA repositories are publicly hosted on GitHub under open-access licenses.
In addition to GitHub’s distributed infrastructure, SCTA data is redundantly backed up through multiple independent systems, including:
These overlapping systems provide resilience against accidental loss, corruption, or service interruption.
The SCTA primarily works with historical public-domain texts and open scholarly metadata. Consequently, privacy concerns and sensitive personal data issues are minimal.
No protected personal information, medical information, or confidential human subject data is collected or maintained within SCTA repositories.
Access controls for repository administration and deployment infrastructure follow standard institutional and GitHub security practices.
The SCTA follows an open-access and progressive publication model.
All repositories are publicly accessible and licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License:
Rather than delaying publication until the completion of a grant cycle or editorial project, SCTA data is published continuously as improvements are merged into public repositories.
When pull requests are accepted and merged:
This progressive publication model ensures rapid dissemination while maintaining full version transparency.
A core feature of the SCTA model is long-term community stewardship.
The SCTA community assumes ongoing responsibility for maintaining and preserving datasets beyond the lifetime of any individual grant-funded project. This infrastructure-oriented approach ensures continuity and sustainability in ways often unavailable to isolated short-term research initiatives.
Long-term preservation is supported through:
SCTA sustainability is funded through a consortium-style membership model in which supporting libraries and institutions contribute annual membership fees. Grant-funded projects utilizing SCTA infrastructure also contribute negotiated membership support during the lifetime of the grant. For more information on community funding please see our membership policy page
Financial administration and stewardship are managed through the SCTA’s fiscal sponsor, Council on Library and Information Resources (CLIR)
Responsibility for SCTA data stewardship is coordinated by the Director and General Editor (Jeffrey C. Witt) and distributed across the SCTA community, member projects, repository maintainers, and affiliated institutional partners.