Hiring: Data Scientist (Research Assistant) for PKP

Hi everyone, not sure if this is the correct thread to post this, but I found this interesting announcement on Twitter.

I copy the text below:

Hiring: Data Scientist (Research Assistant) – Public Knowledge Project & ScholCommLab

Organizations: Public Knowledge Project (PKP) and the Scholarly Communications Lab (ScholCommLab)

Contact: Dr. Juan Pablo Alperin, Scientific Director, PKP; Co-Director, ScholCommLab (jalperin@sfu.ca)

Job Title: Data Scientist (Research Assistant)

Project Description

The Public Knowledge Project (PKP) and the Scholarly Communications Lab (ScholCommLab) at Simon Fraser University are looking for a research assistant to work on the Metadata for everyone: Identifying and measuring metadata quality issues across cultures starting as soon as possible and no later than September 30, 2022.

Metadata is a vital aspect of academic publishing. It ensures accurate identification and citation of a work. It can improve discoverability, access, dissemination, preservation, and, arguably, research impact. It can help disambiguate similar works. However, despite its importance, little is known about the quality of the metadata that is currently in use, and the impacts of poor metadata have rarely been studied. This project will therefore explore the metadata quality, consistency and completeness from various individual journals and communities. The project will pay special attention to elements that are most likely to vary across cultures, such as names and those that are potentially multi-lingual, with the understanding that metadata issues do not affect nor impact all communities in the same way.

Full details of the project can be found here: PKP/ScholCommLab Crossref RFP Response (Public) - Google Docs. The Research Assistant will assist in “Phase 2” of the project. This phase will build upon the sample analysis in phase 1 and will include big-data analyses to quantify the completeness, inconsistencies, and idiosyncrasies found in publication metadata as well as attempts to automate heuristics to identify and/or resolve these issues.


  • Analyze metadata issues identified in Phase 1 and determine if additional examples are needed
  • Determine a sampling strategy of records from Crossref’s database for Phase 2 analysis
  • Develop heuristics, classifiers, and/or other means of computationally detecting metadata issues across records
  • Publish and document all code and related documents to encourage scrutiny and reuse


  • Essential:

    • Understanding of data concepts (metadata, data quality)
    • Able to analyze and synthesize data (big data, data analysis)
    • Expert knowledge of either Python or R
    • Knowledge of Natural Language Processing tools and Machine Learning methods (e.g., Python’s NLTK)
    • Detail-oriented and highly organized
    • Strong written and verbal communication skills
  • Valuable:

    • Knowledge of scholarly communications and academic publishing
    • Expertise with metadata schemas
    • Knowledge of multiple languages
    • Experience with XML and/or JSON

Interested people should apply even if they don’t feel that their background is a 100% match with the position description. All candidates will be given full consideration.

Rate of pay

CAD $40-$55/hr (including vacation and statutory holiday pay, no medical or dental benefits) based on experience.

Working arrangements

PKP and the SCL are remote teams and this position will likewise be remote. The chosen candidate should be highly motivated and able to work independently under limited supervision. The position is for up to 300 hours, to be completed before December 15th, 2022. Working hours are flexible, with some obligation to participate in scheduled meetings.

Application process

Interested applicants should apply with a resume and a brief cover letter outlining your relevant experience. Additional materials (e.g., examples of previous work) are welcome, but not required. Applications can be sent to Dr. Alperin at jalperin@sfu.ca with the subject line: “Metadata Research Assistant.” Applications will be reviewed as they are received. All applications received by September 9th will be given full consideration, but applications will remain open until the position is filled.

Equity and inclusion

Equity and diversity are essential to academic excellence. An open and diverse community fosters the inclusion of voices that have been underrepresented or discouraged. We encourage applications from members of groups that have been marginalized on any grounds, including but not limited to, gender identity or expression, sexual orientation, disability, age, and/or status as Black, Indigenous, or a Person of Colour (BIPOC). All qualified candidates are encouraged to apply.

About PKP

PKP is a university-based initiative developing (free) open source software and conducting research to improve the quality, reach, and diversity of scholarly publishing. PKP’s various website platforms, including Open Journal Systems, Open Preprint Systems, and Open Monograph Press, guide users through the editorial workflow of scholarly publishing, including submission, review, editing, publishing and indexing. Thousands of people around the world are now using the software to publish independent journals on a peer-reviewed and open access basis, greatly increasing the public and global contribution of research and scholarship.

About the ScholCommLab

The ScholCommLab is a diverse multidisciplinary and multinational team of researchers interested in all aspects of scholarly communication. Based in Ottawa and Vancouver, Canada, the lab explores a wide range of questions using a combination of computational techniques, innovative methods, and traditional mixed methods to investigate how knowledge is produced, disseminated and used. The ScholCommLab is co-directed by Stefanie Haustein and Juan Pablo Alperin and is associated with the School of Information Studies at the University of Ottawa and the School of Publishing and the Public Knowledge Project at Simon Fraser University. The ScholCommLab values and practices open science, it has an establishedcode of conduct, and clearauthorship guidelines.