The following links are stored here for my personal convenience. My apologies if the organization of this page looks haphazard to you (well, it largely is). While majority of the links are directly related to data analysis and programming, at the bottom of the list you can see a collection of links to work websites and major news services.
Data Analysis in general:
A good Indian website on data analysis.
The Open Source Data Science Masters
The Yhat Blog
Online Statistics Education
Data Science Central
O’Reilly Data blog
NYTimes Data blog
NYTimes Developers Network
I love the Programming Historian website. It provides a series of tutorials in tasks of crucial importance to students and researchers in humanities. If you have an idea of what other tutorials can be published, you can create and submit them!
Research at Facebook
Political Data Science
A catalogue of data visualization tools, focusing on open-source and free solutions.
Storytelling with data
IBM Data Science Experience blog
Reddit ‘Data is Beautiful’
Data Visualization Catalogue
Pew Research Center
Cynthia Brewer’s ColorBrewer
Duke Lib Visualization Types
Who doesn’t love reading documentation?
Beautiful Soup documentation
Practical Business Python
Real Python blog
Planet Python blog
Invent with Python blog
Python Library blog
Automate the Boring Stuff with Python
Invent with Python
Website of Albert Sweigart, who offers his books on programming with Python for free. He is also the author of Automate the Boring Stuff with Python.
Write and run code in your browser.
I guess there’s no need to mention it separately, but I access it just so often!
Gutenberg library documentation
I thought that the Gutenberg Project gives infinite possibilities for computational linguists. Alas, their website is for human use only. This Python library deals with book metadata.
Wikipedia library documentation
Access Wikipedia data through Python with this library.
Swirlpy was meant to mirror the swirl functionality for Python. Not sure if there are good lessons out!
Website run by Jake Vanderplas of the University of Washington, whom I finally stopped confusing with Jim Vallandingham. Jake is the author of the wonderful Python Data Science Handbook. I like that he hosts his website on GitHub.
Natural Language Processing Toolkit
A wonderful tool for dealing with language data in Python. Using it, you need to refer to: Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc.
Library for machine learning in Python.
Look for information in the world of R (because Google doesn’t).
R entries from a variety of different blogs, all in one place.
Cookbook for R
StackOverflow R FAQ
Google’s R style guide
R language definition
Website of the amazing plotting system developed by Hadley Wickham.
The website provides some nice quality tutorials on data mining and text mining (and Twitter mining) in R.
World Bank data with R
Python and XML
SQL and database administration:
Oracle SQL reference
Khan Academy SQL
Interactive SQL textbook
SQL for Web Nerds
Stanford online database course
Text and editing:
VIM Crash course
VIM Quick and dirty
The Data School
Beginner’s guide to Python from Python.org
School of Data
Dash General Assembly
Developing linguistic corpora
Python for Linguists
Corpus Linguistic Methods
Martin Weisser’s website
Programming in Python for Linguists
Washington Uni corpus resources
Git reference for beginners
Forks in GitHub and Git
Code for Data Science from Scratch
List of data science blogs
Introduction to Statistical Learning
A module for corpus linguistics
The Open Source Data Science Masters
Awesome Data Science
Awesome Machine Learning
Data Science Notebooks
Statistics and Machine Learning Notebooks
Data Science Resources
Data Science in R
Some more good notebooks
Most recently, majority of the torrent content uploaded are the Udacity course datasets. But there are also others. One of my favourite is a collection of 7,000 emails of Hillary Clinton.
UK Govt data
US Govt data
Indian Govt data
Russian Govt data
Because who doesn’t love the Internet Archive?
Data from Wikipedia, neatly arranged.
Princeton University curated datasets
Armed Conflict Database
Access to database requires a yearly subscription, which isn’t cheap.
Peace and Conflict Research, Uppsala
Armed Conflict Location & Event Data
Uppsala Conflict Data Program
International Conflict Research Zurich
India Open Data
Berghof Foundation, Conflict Research
Global Administrative Areas
Dataset with boundary limits, valuable for plotting geographic distributions.
Massive collection of academic datasets. Free registration.
Very valuable datasets on Indian elections—participation and results. Some of the data is not exactly free.
API news on ProgrammableWeb
Internet Archive API
Uppsala Conflict Data API
Analytics Vidhya Hackathons
A blog devoted primarily to databases, MySQL and SQL.
What you really should pay attention to is author’s wonderful series of Jupyter Notebooks for working with Pandas, Modern Pandas!
Some really nice tutorials for working with R.
Psychologist’s view on data analysis and visualization (tending towards R and d3.js).
I know, Morten is not really an analyst. But I need this link. He publishes a lot about WordPress.
With the advent of his course Programming for Beginners, dr. Chuck became the Python guru of millions of programmer wannabes on the planet, me included.
The appalling website of the great mind behind the Perl programming language (that is what happens when chartreuse is your favourite colour…).
Mick Hammond specializes in technology and social research. He writes extensively on online communities.
Hadley Wickham developed the ggplot2 plotting system in R.
Author of the book Python Machine Learning.
Sean J. Taylor
Computational social scientist from Facebook. Looks a lot like Snowden, but I am pretty sure he is a different person.
Solomon Messing used to be a research scientist at Facebook, and has since shifted to Pew Research Center’s DataLabs. Doesn’t blog that often, but has a few good research papers on his website.
One of the authors of NLTK for Python, and a professor of linguistics.
Keith is a statistics ninja at the Newcastle University. I love his tutorials on displaying geographical data in R.
Mike Bostock is the creator of D3. The website has some great tutorials for this library.
Statistical methods and social sciences!
Master Code Online
I had no faith the first time I’ve seen the content here. But then, interesting tutorials began to appear—most recently, a walkthrough to creating a SEO program in Python.
Unfortunately, only for Udacity graduates.
Stack Overflow Careers
Workaline (remote jobs)
Forex and stocks:
How the market works
BabyPips.com (Beginner’s guide to Forex)
The Guardian (economic section)
NASDAQ historical data
End-of-day & historical stock data
Economic Times of India
Stock Traders Daily
The Economist Intelligence Unit
Amnesty International news
Amnesty International research
Polska Agencja Prasowa
Centrum Badania Opinii Społecznej
Economic Political Weekly
South Asia Terrorism Portal
Check the weekly South Asia Intelligence Review!
Ideas for India
Sri Lanka Brief
Sri Lanka Guardian
Centre for Policy Alternatives
Constitutional Assembly of Sri Lanka
Tamil Nation, mirror
Max Planck Institute for the Study of Religious and Ethnic Diversity
The link leads to the Data Visualization section.
Centre for the Study of Developing Societies
German Institute of Global and Area Studies
International Affairs (Oxford Academic journal)
United Nations Peacemaker
Analitika, Center for Social Research
Journal of Democracy, downloadable