Saturday, October 22, 2016

Data Science in Pakistan

Pakistan should invest in data science (Text Mining, Machine Learning, NLP, etc) in terms of sustainable research for years to come.

I will share three reasons which are as follows:
  1. No matter what industry we may want to evolve, develop, sustain, improve etc, we need to be ahead of other nations by maintaining our own uniqueness. For this to happen we need to analyse a lot of our own data over a period of time by our own experts (so that nothing crucial leaks to the outside world).
  2. Adopting solutions from research outcomes can shape education in a unique way, for example, what to study, combine in later degrees or cultivate experience to make the most out of qualification for the application. Analysing commonly asked questions by students through adopted systems can improve the gap between knowledge learned and knowledge applied. This task shall never be outsourced, we shall not be dependent over other nations in our critical ambitions.
  3. Analysing anomalies in any sector shall be aided by automated systems which should be developed inside Pakistan to ensure protection of our "sensitive data" which may define statistics about Pakistan and its population.
Proposition: we need government funded research facility for data science ambitions, where scientist/researchers should be hired aggressively with a sustainable budgets for decades. These advanced hires should be given open hand and piece of mind to publish in their works in top research venues. We need a mix of foreigners (bringing them to Pakistan is not an issue) and Pakistanis to kick start the process.

Wednesday, February 11, 2015

Parallel Universe Fact or Fiction

If you are not aware of what the parallel universe is I will give you two videos to start with the basic concept, if you are short on time watch the second one and keep reading.

1- National Geographic - Parallel Universes (~45 mins doc) http://dai.ly/xk0r0q
2- Drama series called Fringe - A short explanation (~4 min) http://youtu.be/jnA-C6DmRSM

So a parallel universe (or universes) is a universe where each and everything is exactly the same as ours (a version of us all) except a few differences, in one universe 9/11 took place but in the other it didn't (hence it has its own history with some changes), in one universe Adolf Hitler never made any attack therefore a different history, likewise in another universe I (the version of mine) did not decide to write this blog hence a different history which followed afterwords from the one where I decided to write this blog. So by now the basic concept of parallel universe should be understandable, lets continue.

Among the theories on parallel universe a theory takes the position that each decision splits the universe into multi-universe, for example: in one universe person "x" might decide to eat a chocolate at instance "t" while in other at time "t" "x" won't decide to eat a chocolate, this seems like a very simple theory, but it makes no sense to me when I look for exhaustive possibilities, like in one universe a version of person "x" dies and in other he won't and if we continue this exhaustive trend than there must always remain a universe where the person "x" will live forever (because in one universe he dies at "t" and in another he does not die at "t" but at "t+1" and another does not die at "t+1", and the trend continues). Similarly, a universe shall exist where everyone lives forever and a universe where nobody was ever born. Therefore, this theory commits a philosophical suicide if investigated a little further. So, my point, lets enjoy the fiction of parallel universe; it is good for sci-fi drama but not good for science, so lets keep them separate, and I continue to live as a fan of parallel universe in sci-fi dramas.

By now you know I am a fan of parallel universe, please don't forget to share some cool stuff with me (names of dramas or movies, etc)

Sunday, December 28, 2014

Nation and its Progress: Values and Material Strength

According to my personal understanding values and material strength define a nation's progress. Values can be easily understood as ethical lines such as liberalism, secularism, faith, etc. Material strength implies understanding/application of science and technology in order to benefit from it. I will give an analogy to explain my understanding: value or ethical line is like a spirit and material strength is the body in which the spirit lives, and together these two make up a living body. Absence of any of the two would lead to death, i.e., a body cannot survive without spirit in it and likewise spirit without a body is meaningless. Therefore, good values with no material strength would leave a nation nowhere but at the mercy of other nations who have better material strength. Likewise, a nation with no sense of values can lead a barbaric lifestyle and hence, may enter into committing societal suicide with its own technology such as inflicting wars inside and outside for selfish gains. These unjust wars inflicted outside the nation will make the (waging war) nation insecure from within morally because the returning ruthless soldiers will themselves make their own land insecure with their ruthless behaviours. Similarly without understanding of values the rich will exploit the poor and for this the rich will use best-known technology, laws and rules (twisting them in their favour) to exploit the poor, and hence, eventually the nation will observe protests and friction within its own citizens causing further deterioration.

Being a "Muslim" and from "Pakistan", I will now limit the scope of my writing exclusively towards these two entities. however, I am confident that people belonging to similar (specially from Muslim World) or dissimilar entities can find some general points as a take-away. So, being a Muslim, I will simply state Islam as being the ethical line for a Muslim (I assume that the person is practising the faith), and being a Pakistani I know we have not reached scientific enlightenment in different spheres of life by a long shot. So what goes around in Pakistan is like this, some of us observe that the problem lies in not practising good values, therefore those of us choose to learn and practice values and this is where our youth takes inspiration in learning Islam. On the other hand others consider that improvement of scientific progress is the way forward, and therefore, these people resume to take inspiration from science, a few of them actually learn it or perfect it but most of them only get satisfied by finding heroes in science (of-course scientific enlightenment is not easy for everyone). I personally believe that a win-win here is to have people who addresses both issues simultaneously. However, they may choose to gain expertise in one but they should have understanding of both. A lot of us have heard the phrase "education brings progress", I agree but I feel sometimes people who say this sentence do not actually comprehend the deep meaning behind this phrase because education is not only about science (which some people actually imply by the sentence) but it is also about learning good values and practising them. It would be last thing for a progressive society to find itself with good technology but with barbaric behaviours towards each other (I don't see harmony here). Some examples where we are deficient of good values are the following:

1- We comment on others' mistakes but when someone points our own error that becomes the last day of friendship with an implicit expression of "how dare you?"
2- We are friends with each other but say bad things behind their backs.
3- We follow a political party and the wrong of that political party becomes justified because other political parties are also doing wrongs.
4- We break our promises so easily; I will be there at 8:00 PM but until 9:00 PM there is no sign of me being there.
5- In weddings showing off money is more important than actually collecting the prayers of guests for the new couple.



I will conclude my article to show how much we much we actually respect ethical lines (ours is Islam) and scientific progress as a nation.

"If a child is not sharp or intelligent send him/her to Madersa (Islamic school) and if a person is not fit for a challenging job then let him/her do a PhD degree". 

We need our top brass in Islamic schools and likewise, we need our top brass in PhD programs, but so far our collective efforts as a nation is less than what is sufficient.

Sunday, May 11, 2014

[SIGIR2014 Demo] A System-Oriented View Towards Bias in Search Process: Visualizing Perspectives in News

The notion of "bias" in search results has been investigated by the information retrieval research community. However, so far all investigations seem to take a user-oriented view of "bias" when considering the search process i.e., the user's tendency to click on results that are highly positioned within a search engine's ranked list, or the user's tendency to click on results that have more query terms in the search result title or summary. In the proposed demonstration accepted at SIGIR 2014, we take a system-oriented approach towards "bias" within the search process and offer a new interface for users to investigate the "perspective biases" in documents returned by a search engine.
To clearly illustrate what we mean by "perspective bias", we have focused on the news domain where the inherent bias lies for the most part within the news collection itself such as news web sites having a "leftist" or "rightist" agenda. Consider a case in which a user wishes to find information about a certain event (say, a bomb blast in a certain region). The search results returned may be polarized instead of focusing on factual aspects i.e., relating to a certain race, ethnicity, or political movement which caused violence. This can prompt a user to explicitly evaluate a move from objective factual reporting to subjective reporting within the top results and this is where perspective-aware search comes to the rescue as shown in the figure below. Here, the user is asked to input a normal search query and a perspective allowing the user to highlight the presence of a perspective in the search results.


For the purpose of demonstration, the system returns the top 10 news stories for the query from Bing, Yahoo and Google and then calculates a perspective score for each result while at the same time using graph visualizations to illustrate the perspective scores for each news source and each search engine.

Below is a video demonstration of the "perspective-aware search system". I will be attending SIGIR 2014 in Gold Coast, Australia and for that I owe a special thanks to SIGIR Travel Grants Committee who has funded my travel to SIGIR 2014. See you Information Retrieval folks in Australia where I will be available to explain more aspects of this novel search interface.

Sunday, August 18, 2013

User-Defined Query Term Weighting in Lucene

I am sharing a simple code with explanation on how Lucene (pyLucene to be specific) can be used for Query Expansion.

What I will not discuss here is how to devise a strategy for finding new terms for Query Expansion (a person can implement this on his/her own). But what I will explain here is, how one can assign different weights to query terms for retrieval task.

Consider four documents having following content
D1 -> 'pagerank pagerank algorithm'
D2 -> 'pagerank algorithm algorithm',
D3 -> 'pagerank',
D4 -> 'algorithm'

It implies our vocabulary of corpus is just 'pagerank' and 'algorithm', while corpus frequency of each term is 4 and document frequency is 3. Hence now idf and cf does not influence the scoring technique.

In the attached source code you can see that we have boosted term 'pagerank' by 10 times compared to term 'algorithm'. The query is 'pagerank algorithm'.

Upon retrieving the document D3 has 10 times higher score than D4 and likewise D1 has higher score than D2 (but not 10 times since the document's total terms are 3 which influences the scoring unlike in previously discussed case). Please run the source code and observe the results.

Source code: http://codeviewer.org/view/code:35db

Version: pylucene-3.6

Saturday, July 27, 2013

EuroHCIR2013 Work Towards a New Search Interface namely Perspective-Aware Search

Recently an updated version was demoed  in SIGIR 2014: http://dl.acm.org/citation.cfm?id=2611184

There are occasions when search results do not satisfy the information need and give a completely undesirable set of results than what the user is looking for. A possible reason for this lies inside the returned documents which contain some perspectives while giving coverage to the topic and this perspective may be observed as bias by the user.
Lets take the following example scenarios:
  • Consider a case where a user wishes to find information about a certain event (say, a bomb attack in a certain region). The search results returned, contain a majority of news reports blaming Islam (its implicit writing style) relating it with terrorism in most of the cases. This prompts the user to explicitly observe how much Islam is related with terrorism in the returned set of search results.
  • Consider another case where a user wishes to find information about roles and rights of women in Islam but the search engine returns articles that contain a tendency of highlighting oppression against women instead of women rights and roles. In this case the user observes a correlation between women and oppression instead of factual position on rights.
In the above cases, the user's information need may lead him towards an explicit investigation of the underlying document collection and he/she may be interested in observing the amount of perspective tendencies in various search results (e.g., news reports). Current search engines do not facilitate this need by highlighting perspectives while displaying the search results. Hence, we propose the concept of "perspective-aware search." The proposed search interface enables the user to explicitly analyze search results with a touch of perspective awareness.

The following presentation contains some screen-shots of the proposed search interface; I will be giving a demo of this system at EuroHCIR Workshop that is co-located with SIGIR2013.


The system is built on top of the WikiMadeEasy API which is an API for mining Wikipedia data and is the output of work I am doing towards my PhD thesis. Feel free to contact me for more details of the API. The full paper describing the system can be found here.

Tuesday, February 5, 2013

Python: Reading large bz2 file with bz2.BZ2File()

There might arise a problem of partial (incomplete) reading of a file while reading a bz2 file in python.

The tip to overcome such a problem is very simple, uncompress the bz2 file using extraction utility (Ubuntu has the graphical utility by default). Once extracted, zip it back as bz2 and now try reading it again, this time you may have solved the problem.

Reason for the problem: the side that produced the bz2 file may have produced the bz2 file from multiple files which is not well recognized by bz2.BZ2File() functionality in python.