Automating Research Code Allocation at the University of Newcastle
About University of Newcastle
The University of Newcastle is a research-intensive university and leading contributor to research in Australia and the world – Top 9 in Australia for Research Income (HERDC). With over 450 research development partners, the University of Newcastle is consistently named amongst the top 10 Australian universities.
Australian research institutions have a direct requirement to gather, process, and report on the research output of its staff. This is most keenly felt in the administration of Australia’s national research evaluation framework, known as the Excellence in Research for Australia (ERA) by the Australian Research Council (ARC). This reporting occurs roughly every 3 years, providing the Government, universities, industry, and prospective students with valuable information about research performance in Australian higher education institutions.
Each year, University of Newcastle staff and students produce over 3,000 research publications, each needing to be assigned into at least one, and commonly 2-4, Field of Research (FoR) codes that accurately categorise the subject/s of the specific discipline the research has been undertaken in. The manual process involved in reviewing each publication and assigning FoR codes can take over 2,500 hours per ERA cycle. The Research and Innovation (R&I) department wanted to know if there could be a more efficient method of assigning FoR codes to the universities research output.
Intellify and the University of Newcastle formed a cross-functional team with experts in the domain, data systems, and data science techniques required to develop a machine learning classifier solution. The project followed the Intellify process for developing machine learning systems, which includes:
- identify the value-creation opportunity to ensure project success.
- define the factors that affect the problem to make sure we have representative data.
- prototype to prove the value of the solution.
- deploy to help the business realize the value of the system.
The result for the University was a machine learning system that was able to take into account both structured and unstructured data from 5 different systems to determine the most likely FoR codes from a list of 157 potential classifications for a given research paper. The classifier model took in information about the authors’ research background and publication history, the journal the article was published in and the “natural language” contents of the paper to derive its predictions. To ensure the best possible outputs, we worked with the domain stakeholders to overlay business rules on top of the machine learning system recommendations. This solution could then be triggered by the Research & Innovation team when needed to make a new set of predictions as new publications became available.
How we helped
The project was a collaborative effort between Intellify and the University of Newcastle, with stakeholder representation across different departments, functions, and skillsets to ensure a successful project outcome. While it focused on delivering benefits through automation, this project also demonstrated to the University the possibilities and impact that Machine Learning.
By eliminating the manual process that originally tool approximately 2,500 hours, the FoR classifier pilot delivered a significant improvement in the time taken to allocate codes to research papers. Additionally, as the classifier can be run just-in-time, the University can get a view of research outputs meaning they can identify research gaps in a more timely manner.
To learn more about the University of Newcastle project, watch the video