The BioCaster website ('BioCaster') at the University of Cambridge presents structured information about disease outbreaks, links to Internet news and related knowledge resources to people interested in public health and safety.
Monitoring many thousands of Internet news feeds simultaneously. Early detection and tracking of infectious disease outbreaks.
Using Machine Translation technology to overcome the language barrier in 10 languages: Arabic, Chinese, French, Indonesian, Farsi, Korean, Portuguese, Spanish, Russian, and Swahili.
A web/database server and a backend cluster computer equipped with hybrid symbolic-neural NLP technology which continuously scans hundreds of RSS newsfeeds from local and national news providers.
First launched in 2006, now upgraded and re-launched BioCaster 2020, with a team of epidemiologists, computer scientists, social scientists and computational biologists
Prof. Nigel CollierUK PI, University of Cambridge
Prof. David BuckeridgeCanada PI, McGill University
Prof. Nicholas KingCanada Co-I, McGill University
Dr Zaiqiao MengUniversity of Cambridge
Dr Anya OkhmatovskaiaMcGill University
Dr Maxime PolleriMcGill University
Mr Guido PowellMcGill University
Ms Yannan ShenMcGill University
Ms Iris GanserMcGill University
Dr Zihao FuUniversity of Cambridge
Ms Meiru ZhangUniversity of Cambridge
The emergence of disease outbreaks is of the greatest importance to the international community. Making a rapid response crucially depends on having timely evidence, yet traditional bio-surveillance using human networks is often unavailable in real-time, patchy in geographic coverage, and tuned to specific diseases. Digital disease surveillance (DDS) using Web-based news and social media data aims to overcome some of these limitations. Real-time DDS was pioneered in the early 2000s by the Canadian GPHIN system, which detected the first SARS evidence and in more recent times has been joined by other systems such as BioCaster.BioCaster is a fully automated real-time media monitoring system based on Natural Language Processing (NLP) technology. The system was first launched in 2006 as a research prototype by Dr Nigel Collier at the National Institute of Informatics in Japan and ran until 2012 with funding support from multiple sources include the Japan Society for the Promotion of Science. With funding from the Canada-UK Artificial Intelligence Initiative, Principle Investigators from Cambridge University and McGill University have partnered to upgrade and re-launched BioCaster as part of the EPI-AI project. EPI-AI is a team of epidemiologists, computer scientists, social scientists and computational biologists working together to improve early warning for public health. Both PI’s are currently members of the WHO’s Epidemic Intelligence from Open Sources (EIOS) initiative.
Early detection and tracking of infectious disease outbreaks involves having access to information from a variety of sources. Increasingly this means monitoring many thousands of Internet news feeds simultaneously. However three difficulties exist in finding information using traditional search methods: firstly the massive volume of dynamically changing unstructured news data makes it extremely difficult for governments and public health workers to obtain a clear picture of the outbreak. Secondly, the initial reports of an outbreak are contained in only a few news articles which will usually be overlooked using simple keyword indexing methods. Thirdly, the initial reports of an infectious disease will usually be reported in local none-English news media. In order to capture outbreak information in the most timely manner it is therefore crucial for computer systems to have an understanding of several languages. As part of the EPI-AI project we have partnered with SDL (now part of RWS) to use their Machine Translation Edge technology to overcome the language barrier in 10 languages: Arabic, Chinese, French, Indonesian, Farsi, Korean, Portuguese, Spanish, Russian, and Swahili.The second generation of BioCaster has two major components: a web/database server (built on Elasticsearch and Kibana) and a backend cluster computer (Rocks) equipped with hybrid symbolic-neural NLP technology which continuously scans hundreds of RSS newsfeeds from local and national news providers. Since the NLP system has a detailed knowledge about the important concepts such as diseases, pathogens, phenotypes, people, places, drugs etc. this allows us to semantically index relevant parts of news articles, enabling users to have quicker and highly precise access to information. The knowledge we use comes from annotated text collections (e.g. the PheneBank corpus and the COMETA corpus), gazetteer lists of nomenclature and the BioCaster ontology, all of which are currently under development. We are making the new BioCaster system available for public access and feedback in the hope that it will be useful to those interested in the field. Software resources are also expected to be released as the project progresses. Supplementary information will be published in international conferences and journals.
Feedback on the new BioCaster research prototype is very welcome. Please send your comments to Prof. Nigel Collier.
About this data
BioCaster is only possible thanks to freely available information sources on the Web. We are very grateful in particular to the following sources listed below. Whilst we acknowledge these organisations, mention here does not imply any endorsement or affiliation.
Google News – a commercial news aggregator service developed by Google.
ProMed Mail – a program of the International Society for Infectious Diseases for identifying unusual health events.
Medisys – a fully-automated event-based media monitoring system to rapidly identify potential public health threats from open source media.
World Health Organisation – a specialised agency of the United Nations responsible for international public health.
The United Nations – an intergovernmental organization aiming to maintain international peace and security.
The European Centre for Disease Prevention and Control – an agency of the European Union aimed at strengthening Europe’s defences against infectious diseases.
Center for Infectious Disease Research and Policy – a center within the University of Minnesota focusing on public health preparedness.
EurekAlert! – a non-profit news-release distribution platform operated by the American Association for the Advancement of Science (AAAS).
China News Service – a state-level news agency in China.
The Food and Agriculture Organization of the United Nations – a specialised agency of the United Nations that leads international efforts to defeat hunger.
Please note that disease news data changes rapidly and differs by location and language so it may not reflect the pattern of disease outbreaks in some areas, for example some areas may be over-reported and some may be under-reported. The numbers of news reports on BioCaster may differ from aggregated data on other disease outbreak monitoring sites because the data is gathered and analysed in different ways. The relationship between news report counts and outbreak cases is a complex one that we are currently trying to understand in our research.
- SDL Machine Translation
- Dr David Pigott (University of Washington)
- Dr Jake Dunning (Public Health England)
- Dr Jens Linge (EU Joint Research Centre)
- Dr Susanna Ogunnaike-Cooke (GPHIN/PHAC)
- Prof. Bill Byrne (University of Cambridge)
- Prof. Simon Frost (Microsoft Research and London School of Hygiene and Tropical Medicine)
- Dr Philip Abdelmalik (World Health Organisation)
Nigel Collier, Professor of Natural Language Processing