Oct 30, 2014

Statistical Analysis of the Holy Quran (Part 1)

The English and Arabic corpus of the Holy Quran is a rich source for statistical analysis. For instance, the entire test corpora has half a million words and many thousand distinct words. A rich dataset such as the Holy Quran, therefore, provides for an exciting journey of data exploration. More

Ali Gajani Data Science 7

Aug 6, 2014

Data mining 1.5 million tweets for Twitter sentiment analysis

The contents of this blog post are inherited from a short research project by Group 10 of the Information Retrieval and Data Mining module at University College London. Instead of letting it rot in my Dropbox, I decide to free the knowledge and hope someone finds it useful. More

Ali Gajani Data Science 6

May 14, 2014

How to open large text files (>5 GB) on a Mac ?

A month ago, I downloaded a large dataset from Twitter. The .txt file consisted of around 1.5 million tweets in JSON and weighed at 5.5 GB. I wanted to look at the structure of the JSON in order to design a parser for processing the tweets. I was then working on a Sentiment Analysis project. As I attempted to open the file in Sublime Text 2, my powerful Mac just gave up. More

Ali Gajani Data Science 5

Apr 12, 2014

Graph Theory 101: Directed and Undirected Graphs

This is a very short introduction to graph theory. We will be talking about directed and undirected graphs, the formulas to find the maximum possible edges for them and the mathematical proofs that underlie the philosophy of why they work. This is my first use of LaTeX on Mr. Geek. More

Ali Gajani Data Science 0

Mar 2, 2014

Measuring influence in a group using social network analysis

I have decided to publish the contents of my Complex Networks and Web coursework project here on Mr. Geek. The information contained in this post might be complex to some, but I assure you that this will be a good long read. I have included lots of pictures to make sure you don’t get bored in the swathes of text. More

Ali Gajani Data Science 1

Feb 22, 2014

These 82 Countries (GDP) are worth lesser than Whatsapp

A bit of data crunching shows the enormity of the number, $19 billion. Yes, 82 countries (and more) have a GDP (Gross Domestic Product) lesser than what Whatsapp is worth. I used Excel and the 2012 GDP data from WordBank’s site. The list is shown below.

More

Ali Gajani Data Science 0