Computer Assisted reporting

Overview

In this certified e-Master Class you will gain insights on using public databases, internet databases, social media, information extraction to create powerful visualizations. 

With more and more people relying on social medias and blogs to be informed, it is becoming increasingly difficult for the historic news outlets to attract an audience.

Data journalism is gathering, filtering and visualizing what is happening beyond what the eye can see has a growing value. By using the data to connects the dots, journalists can give meaning to little points of information that are often not relevant in a single instance, but hugely important when viewed from the right angle. And they can cover a wide range of topics such as the next financial crisis that is in the making, the economics behind the products we use, the misuse of funds or political blunders and then present it in a compelling visualization of the data that leaves little room for argument.

Working with data is like stepping into vast, unknown territory. At first look, raw data is puzzling to the eyes and to the mind. Data as such is unwieldy. It is quite hard to shape it correctly for visualization. It needs experienced journalists, who have the stamina to look at often confusing, often boring raw data and see the hidden stories there. And it is now possible for you to acquire these skills. By knowing how to develop simple codes, you can find the relevant information in databases which have millions of lines. You can also identify the names or the words in documents that would take centuries for a team of people to read through. Furthermore, you can increase your capacity to make sense of it all by developing graphics that give a totally new dimension to your story. 

This Introduction to Computer Assisted Journalism is led by our partner, Columbia Journalism School and it will give you what you need to reach the next level in your profession.

The e-Master Class will cover the following topics:

How to work with public databases of all size, and to extract valuable information
How to create your own sets of data from the internet and social media
How to use the information to create powerful visualizations. 

Prerequisites

This certified e-Master Class is for investigative journalists and data journalists who already have a basic knowledge of Excel or google sheets.

Format of the course

Columbia Journalism School has developed and organized this course to include six live lectures offered over three weeks. 

  • The Live lectures are 75 minutes long with 15 minutes for questions at the end.
  • The first module is offered as pre-recorded lectures with an accompanying pdf overview of the principles and skills reviewed. This pre-recorded class is three hours including breaks, allowing students to review and pause as they work. All participants must complete the first pre-recorded lecture (Module 1) before joining the live lecture (Module 2).
  • Lectures are offered synchronously twice per week.
  • Assignments are done individually or in small groups between classes. 

Schedule

Each session will take place from 16:00 to 17:45 CET (10:00 to 11:45 EST) on the following dates:

Module 1 (Pre-Recorded, 3 hours). Please complete this before Tuesday 7th June 2022.

This first module is about reviewing all the operations that can be done using Excel. Collecting data from public databases, organizing it, making sense of it. It ends with the limitations of Excel and explains why, in certain situations, you need coding skills to achieve your objectives. Tool used during this session: Microsoft excel.

Module 2 (Live lecture, 90 minutes). Tuesday 7th June 2022.

In the case where the databases have more than a million lines of information, it becomes necessary to use other tools. This second module introduces coding in Python and explains how it can be used in such large environments. By using built-in simple libraries, you can quickly extract information, deliver basic statistics, and produce graphics. Tool used during this session: basic python.

Module 3 (Live lecture, 90 minutes). Thursday 9th June 2022.

This module looks specifically at how you can import large data frames from public databases. You also learn how to filter them so that only the valuable information is kept. Tools used during this session: basic python, pandas, numpy.

Module 4 (Live lecture, 90 minutes). Tuesday 14th June 2022.

This session introduces data visualization. By using simple built-in libraries, you can build powerful pictures. These graphics can help the audience understand complicated situations much better than words.  Tools used during this session: basic python, matplotlib.

Module 5 (Live lecture, 90 minutes). Thursday 16th June 2022.

In this module you'll learn how to extract information from social media and internet websites (web scraping). Instead of going through thousands of pages one after another you can develop short programs (API) that automate these searches for you. Tools used during this session: basic python, API, web scraping, beautiful soup. 

Module 6 (Live lecture, 90 minutes). Tuesday 21nd June 2022.

In this module, you learn how you can extract information through large amount of text documents. Instead of having to read through thousands of pages to find what you are looking for, you can create automated queries that do it for you. Tools used during this session: text analysis, regular expressions. 

Module 7 (Live lecture, 90 minutes). Thursday 23th June 2022.

This last module reviews the tools that have been covered during the program. It also introduces new tools that can be used to further extract information from databases. There will be a review of all the tools covered during the course and SQL.

Skills learnt

  • Coding: Python programming (including scraping, regular expressions, panda, selenium, Beautiful Soup, SQL, DataScript
  • Data: Identifying and using large datasets to extract stories
  • Editorial: Getting access to a wider range of stories by connecting dots that are not readily available
  • Innovation: Computer assisted journalism opens new group in the field of journalism by allowing access to untapped sources of information like public databases, the web and social media
  • Storytelling: Using computer assisted journalism allows journalists to tell stories that would not otherwise be released
  • Strategy: At a time where blogs and Instagram acocunts are giving out 'news', to be able to produce data driven stories can be a way for historical news outlets to differentiate themselves.

Testimonials

"Eventhough I had some experience with python, the exercises were challenging and rewarding. The modules were built on top of each other for the most part. This class provides important tools that are quite rare for journalist. I am frequently using the methods provided by this course to analyze statistics. Really helpful course!" Antti Tauriainen, Mobile jounalist at Yle.

"I liked the variety, that all modules built on each other and worked together. The homework was challenging, but so it should be, instead of just repeating what we have already done and learned. I had to google quite a lot in addition to going back to both notes and videos, but it was a fun and learning challenge.” Sindre Ness, Researcher at NRK - Norwegian Broadcasting Corporation.