Links

The following links are stored here for my personal convenience. My apologies if the organization of this page looks haphazard to you (well, it largely is). While majority of the links are directly related to data analysis and programming, at the bottom of the list you can see a collection of links to work websites and major news services.

Data Analysis in general:

Analytics Vidhya

A good Indian website on data analysis.

The Open Source Data Science Masters

The Yhat Blog

statsguys

Online Statistics Education

Cross Validated

Data Science Central

DBMS 2

O’Reilly Data blog

Simply Statistics

NYTimes Data blog

NYTimes Developers Network

Flowingdata

Programming Historian

I love the Programming Historian website. It provides a series of tutorials in tasks of crucial importance to students and researchers in humanities. If you have an idea of what other tutorials can be published, you can create and submit them!

Google Trends

Research at Facebook

Setosa.io

Idre

Visualizing Economics

Junk Charts

EagerEyes

Data Stories

scyViz

Simply Statistics

Political Data Science

Dataviz Tools

A catalogue of data visualization tools, focusing on open-source and free solutions.

Storytelling with data

Information aesthetics

VizWiz

Perceptual Edge

Visualizing Data

IBM Data Science Experience blog

Reddit ‘Data is Beautiful’

Visualization Universe

Data Visualization Catalogue

Pew Research Center

Data visualization:

Plot.ly

RAWGraphs

Cynthia Brewer’s ColorBrewer

Duke Lib Visualization Types

Python resources:

Python documentation

Who doesn’t love reading documentation?

Beautiful Soup documentation

Practical Business Python

Real Python blog

Planet Python blog

Invent with Python blog

Python Library blog

Automate the Boring Stuff with Python

Invent with Python

Website of Albert Sweigart, who offers his books on programming with Python for free. He is also the author of Automate the Boring Stuff with Python.

effbot.org

trinket.io

Write and run code in your browser.

Python Course

ElementTree documentation

I guess there’s no need to mention it separately, but I access it just so often!

Gutenberg library documentation

I thought that the Gutenberg Project gives infinite possibilities for computational linguists. Alas, their website is for human use only. This Python library deals with book metadata.

Wikipedia library documentation

Access Wikipedia data through Python with this library.

Swirlpy

Swirlpy was meant to mirror the swirl functionality for Python. Not sure if there are good lessons out!

Pythonic Perambulations

Website run by Jake Vanderplas of the University of Washington, whom I finally stopped confusing with Jim Vallandingham. Jake is the author of the wonderful Python Data Science Handbook. I like that he hosts his website on GitHub.

Natural Language Processing Toolkit

A wonderful tool for dealing with language data in Python. Using it, you need to refer to: Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.

Scikit-learn

Library for machine learning in Python.

Pandas resources:

Pandas documentation

R resources:

Rseek

Look for information in the world of R (because Google doesn’t).

Flowingdata tutorials

R-bloggers

R entries from a variety of different blogs, all in one place.

Quick-R

Cookbook for R

StackOverflow R

StackOverflow R FAQ

Google’s R style guide

R language definition

Swirl stats

RStudio blog

R4stats

ggplot2 documentation

Website of the amazing plotting system developed by Hadley Wickham.

RDataMining

The website provides some nice quality tutorials on data mining and text mining (and Twitter mining) in R.

World Bank data with R

JavaScript resources:

JSData

Eloquent JavaScript

Codepen

D3.js

Rickshaw

NVD3

Dimple.js

JavaScript is Sexy

jQuery tutorial

XML resources:

Python and XML

W3Schools

tizag.com

XPath

SQL and database administration:

MySQL documentation

Oracle SQL reference

SitePoint

Vertabelo

w3schools SQL

Khan Academy SQL

SQLZoo

Interactive SQL textbook

Essential SQL

SQLCourse

Database Journal

SQL for Web Nerds

Stanford online database course

Schemaverse

BigQuery documentation

BigQuery tutorial

Text and editing:

Emacs

Sublime Text

Markdown

VIM Adventures

VIM Crash course

VIM Quick and dirty

Learning materials:

Udacity

Coursera

The Data School

Python Programming

DataCamp

Codeacademy

freeCodeCamp

Beginner’s guide to Python from Python.org

MongoDB University

tutorialspoint

edX

School of Data

Princeton

Dash General Assembly

Corpus linguistics:

Developing linguistic corpora

Python for Linguists

Corpus Linguistic Methods

Martin Weisser’s website

Programming in Python for Linguists

Washington Uni corpus resources

OpenCorpora (Russian)

Textual Scholarship

Ethnologue

Git:

GitHub guides

Git reference for beginners

Forks in GitHub and Git

GitHub repositories:

Code for Data Science from Scratch

List of data science blogs

Introduction to Statistical Learning

A module for corpus linguistics

The Open Source Data Science Masters

Awesome Data Science

Awesome Learning

Awesome Machine Learning

Data Science Notebooks

Statistics and Machine Learning Notebooks

Data Science Resources

Awesome R

Data Science in R

Some more good notebooks

D3.js

Datasets:

Academic Torrents

Most recently, majority of the torrent content uploaded are the Udacity course datasets. But there are also others. One of my favourite is a collection of 7,000 emails of Hillary Clinton.

UK Govt data

US Govt data

Indian Govt data

Russian Govt data

Enigma

Datahub

AWS datasets

Internet Archive

Because who doesn’t love the Internet Archive?

DBPedia

Data from Wikipedia, neatly arranged.

Princeton University curated datasets

Armed Conflict Database

Access to database requires a yearly subscription, which isn’t cheap.

Peace and Conflict Research, Uppsala

Armed Conflict Location & Event Data

Uppsala Conflict Data Program

Gapminder

International Conflict Research Zurich

Quandl Collections

India Open Data

Berghof Foundation, Conflict Research

Pornhub Insights

Global Administrative Areas

Dataset with boundary limits, valuable for plotting geographic distributions.

ICPSR Michigan

Massive collection of academic datasets. Free registration.

Lokniti

Very valuable datasets on Indian elections—participation and results. Some of the data is not exactly free.

APIs:

API news on ProgrammableWeb

Europeana

Internet Archive API

Library APIs

Dictionary APIs

WikiMedia API

Uppsala Conflict Data API

Competitions:

Kaggle

TopCoder

DrivenData

Analytics Vidhya Hackathons

People:

Domas Mituzas

A blog devoted primarily to databases, MySQL and SQL.

Tom Augspurger

What you really should pay attention to is author’s wonderful series of Jupyter Notebooks for working with Pandas, Modern Pandas!

Trevor Stephens

Some really nice tutorials for working with R.

Kristoffer Magnusson

Psychologist’s view on data analysis and visualization (tending towards R and d3.js).

Marijn Haverbeke

Marijn is the author of the wonderful Eloquent JavaScript book, which was my first introduction to programming. Look up his interesting MA thesis!

Morten Rand-Hendriksen

I know, Morten is not really an analyst. But I need this link. He publishes a lot about WordPress.

Armin Ronacher

Dough Hellmann

Andrew Gelman

Yegor Bugayenko

Chuck Severance

With the advent of his course Programming for Beginners, dr. Chuck became the Python guru of millions of programmer wannabes on the planet, me included.

Larry Wall

The appalling website of the great mind behind the Perl programming language (that is what happens when chartreuse is your favourite colour…).

Mick Hammond

Mick Hammond specializes in technology and social research. He writes extensively on online communities.

Dirk Hovy

Hadley Wickham

Hadley Wickham developed the ggplot2 plotting system in R.

Gary King

Cory Nissen

Jim Vallandingham

Irene Ros

Peter Beshai

Adam Pearce

Kennedy Elliott

Moritz Stefaner

Alberto Cairo

David Robinson

Sebastian Raschka

Author of the book Python Machine Learning.

Sean J. Taylor

Computational social scientist from Facebook. Looks a lot like Snowden, but I am pretty sure he is a different person.

Solomon Messing

Solomon Messing used to be a research scientist at Facebook, and has since shifted to Pew Research Center’s DataLabs. Doesn’t blog that often, but has a few good research papers on his website.

Steven Bird

One of the authors of NLTK for Python, and a professor of linguistics.

Phil Reed

Keith Newman

Keith is a statistics ninja at the Newcastle University. I love his tutorials on displaying geographical data in R.

Scott Murray

Mike Bostock

Mike Bostock is the creator of D3. The website has some great tutorials for this library.

Andrew Gelman

Statistical methods and social sciences!

YouTube channels:

Data School

PyData

Master Code Online

I had no faith the first time I’ve seen the content here. But then, interesting tutorials began to appear—most recently, a walkthrough to creating a SEO program in Python.

sentdex

RStatsInstitute

Quantitative Specialists

Corey Schafer

LearnWebCode

Baris Yuksel

Coding Entrepreneurs

Job websites:

Udacity Blitz

Unfortunately, only for Udacity graduates.

Upwork

Freelancer

Hired

Indeed

Glassdoor

LinkedIn

Stack Overflow Careers

Monster

Workaline (remote jobs)

Forex and stocks:

DailyFX

How the market works

BabyPips.com (Beginner’s guide to Forex)

Forex.com

The Guardian (economic section)

Financial Times

MarketWatch

NASDAQ historical data

End-of-day & historical stock data

Economic Times of India

Broker Chooser

Investopedia

Stock Trader

Stock Traders Daily

Saxo Academy

Tradingfloor

World news:
English:

Open Democracy

Political Critique

The Economist Intelligence Unit

Unlimited World

Wired

The Guardian

Al Jazeera

BBC

Russia Today

Wikileaks

Amnesty International news

Amnesty International research

ConstitutionNet

Conciliation Resources

German:

Deutsche Welle

Spiegel

Süddeutsche Zeitung

Russian:

Life.ru

Gazeta.ru

Lenta.ru

Polish:

Polska Agencja Prasowa

Culture.pl

Centrum Badania Opinii Społecznej

Wprost

Gazeta Wyborcza

Gemius

E-commerce Polska

Regional news:
India:

The Hindu

Economic Political Weekly

Frontline

Scroll

Kafila

South Asia Terrorism Portal

Check the weekly South Asia Intelligence Review!

Ideas for India

Sri Lanka:

Colombo Page

Colombo Telegraph

Sri Lanka Brief

Sri Lanka Guardian

Sunday Times

Daily Mirror

Daily News

Centre for Policy Alternatives

The Island

Constitutional Assembly of Sri Lanka

TamilNet

Sangam

Tamil Nation, mirror

Groundviews

Tamil Canadian

Pakistan:

Usama Khilji

Eastern Europe:

Eastbook

Academic:

Max Planck Institute for the Study of Religious and Ethnic Diversity

The link leads to the Data Visualization section.

Centre for the Study of Developing Societies

German Institute of Global and Area Studies

International Affairs (Oxford Academic journal)

Constitutional Transitions

United Nations Peacemaker

E-international relations

Analitika, Center for Social Research

Journal of Democracy, downloadable

Project Muse