tag:blogger.com,1999:blog-9436471227479796672024-03-28T20:29:56.048-07:00I am PC, powered by LinuxAnonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.comBlogger20125tag:blogger.com,1999:blog-943647122747979667.post-8970720272529399862016-10-22T05:48:00.003-07:002016-10-22T12:49:36.515-07:00Data Science in Pakistan<div dir="ltr" style="text-align: left;" trbidi="on">
Pakistan should invest in data science (Text Mining, Machine Learning,
NLP, etc) in terms of sustainable research for years to come.<br />
<br />
I will share three reasons which are as follows:<br />
<ol style="text-align: left;">
<li>No matter what industry we may want to evolve, develop, sustain,
improve etc, we need to be ahead of other nations by maintaining our
own uniqueness. For this to happen we need to analyse a lot of our own data over a period of time
by our own experts (so that nothing crucial leaks to the outside world).</li>
<li> Adopting solutions from research outcomes ca<span class="text_exposed_show">n
shape education in a unique way, for example, what to study, combine
in later degrees or cultivate experience to make the most out of qualification for the application.
Analysing commonly asked questions by students through adopted systems
can improve the gap between knowledge learned and knowledge applied. This
task shall never be outsourced, we shall not be dependent over
other nations in our critical ambitions.</span></li>
<li><span class="text_exposed_show">Analysing anomalies in any
sector shall be aided by automated systems which should be developed inside
Pakistan to ensure protection of our "sensitive data" which may define
statistics about Pakistan and its population.</span></li>
</ol>
<div class="text_exposed_show">
<b>Proposition</b>: we need government funded research facility for data
science ambitions, where scientist/researchers should be hired
aggressively with a sustainable budgets for decades. These advanced hires should be
given open hand and piece of mind to publish in their works in top research venues. We need a
mix of foreigners (bringing them to Pakistan is not an issue) and
Pakistanis to kick start the process. </div>
</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com1tag:blogger.com,1999:blog-943647122747979667.post-89075563798347655722015-02-11T14:37:00.000-08:002015-02-11T14:39:06.826-08:00Parallel Universe Fact or Fiction<div dir="ltr" style="text-align: left;" trbidi="on">
If you are not aware of what the parallel universe is I will give you two videos to start with the basic concept, if you are short on time watch the second one and keep reading.<br />
<br />
1- National Geographic - Parallel Universes (~45 mins doc) <a href="http://dai.ly/xk0r0q">http://dai.ly/xk0r0q</a><br />
2- Drama series called Fringe - A short explanation (~4 min) <a href="http://youtu.be/jnA-C6DmRSM">http://youtu.be/jnA-C6DmRSM</a><br />
<br />
So a parallel universe (or universes) is a universe where each and everything is exactly the same as ours (a version of us all) except a few differences, in one universe 9/11 took place but in the other it didn't (hence it has its own history with some changes), in one universe Adolf Hitler never made any attack therefore a different history, likewise in another universe I (the version of mine) did not decide to write this blog hence a different history which followed afterwords from the one where I decided to write this blog. So by now the basic concept of parallel universe should be understandable, lets continue.<br />
<br />
Among the theories on parallel universe a theory takes the position that each decision splits the universe into multi-universe, for example: in one universe person "x" might decide to eat a chocolate at instance "t" while in other at time "t" "x" won't decide to eat a chocolate, this seems like a very simple theory, but it makes no sense to me when I look for exhaustive possibilities, like in one universe a version of person "x" dies and in other he won't and if we continue this exhaustive trend than there must always remain a universe where the person "x" will live forever (because in one universe he dies at "t" and in another he does not die at "t" but at "t+1" and another does not die at "t+1", and the trend continues). Similarly, a universe shall exist where everyone lives forever and a universe where nobody was ever born. Therefore, this theory commits a philosophical suicide if investigated a little further. So, my point, lets enjoy the fiction of parallel universe; it is good for sci-fi drama but not good for science, so lets keep them separate, and I continue to live as a fan of parallel universe in sci-fi dramas.<br />
<br />
By now you know I am a fan of parallel universe, please don't forget to share some cool stuff with me (names of dramas or movies, etc) </div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-52533798188336608042014-12-28T07:33:00.001-08:002014-12-28T19:28:27.546-08:00Nation and its Progress: Values and Material Strength<div dir="ltr" style="text-align: left;" trbidi="on">
According to my personal understanding <i>values</i> and <i>material strength</i> define a nation's progress. Values can be easily understood as ethical lines such as liberalism, secularism, faith, etc. Material strength implies understanding/application of science and technology in order to benefit from it. I will give an analogy to explain my understanding: value or ethical line is like a spirit and material strength is the body in which the spirit lives, and together these two make up a living body. Absence of any of the two would lead to death, i.e., a body cannot survive without spirit in it and likewise spirit without a body is meaningless. Therefore, good values with no material strength would leave a nation nowhere but at the mercy of other nations who have better material strength. Likewise, a nation with no sense of values can lead a barbaric lifestyle and hence, may enter into committing societal suicide with its own technology such as inflicting wars inside and outside for selfish gains. These unjust wars inflicted outside the nation will make the (waging war) nation insecure from within morally because the returning ruthless soldiers will themselves make their own land insecure with their ruthless behaviours. Similarly without understanding of values the rich will exploit the poor and for this the rich will use best-known technology, laws and rules (twisting them in their favour) to exploit the poor, and hence, eventually the nation will observe protests and friction within its own citizens causing further deterioration.<br />
<br />
Being a "Muslim" and from "Pakistan", I will now limit the scope of my writing exclusively towards these two entities. however, I am confident that people belonging to similar (specially from Muslim World) or dissimilar entities can find some general points as a take-away. So, being a Muslim, I will simply state Islam as being the ethical line for a Muslim (I assume that the person is practising the faith), and being a Pakistani I know we have not reached scientific enlightenment in different spheres of life by a long shot. So what goes around in Pakistan is like this, some of us observe that the problem lies in not practising good values, therefore those of us choose to learn and practice values and this is where our youth takes inspiration in learning Islam. On the other hand others consider that improvement of scientific progress is the way forward, and therefore, these people resume to take inspiration from science, a few of them actually learn it or perfect it but most of them only get satisfied by finding heroes in science (of-course scientific enlightenment is not easy for everyone). I personally believe that a win-win here is to have people who addresses both issues simultaneously. However, they may choose to gain expertise in one but they should have understanding of both. A lot of us have heard the phrase "education brings progress", I agree but I feel sometimes people who say this sentence do not actually comprehend the deep meaning behind this phrase because education is not only about science (which some people actually imply by the sentence) but it is also about learning good values and practising them. It would be last thing for a progressive society to find itself with good technology but with barbaric behaviours towards each other (I don't see harmony here). Some examples where we are deficient of good values are the following:<br />
<br />
1- We comment on others' mistakes but when someone points our own error that becomes the last day of friendship with an implicit expression of "how dare you?"<br />
2- We are friends with each other but say bad things behind their backs.<br />
3- We follow a political party and the wrong of that political party becomes justified because other political parties are also doing wrongs.<br />
4- We break our promises so easily; I will be there at 8:00 PM but until 9:00 PM there is no sign of me being there.<br />
5- In weddings showing off money is more important than actually collecting the prayers of guests for the new couple.<br />
<br />
<br />
<br />
I will conclude my article to show how much we much we actually respect ethical lines (ours is Islam) and scientific progress as a nation.<br />
<br />
<blockquote class="tr_bq">
"If a child is not sharp or intelligent send him/her to Madersa (Islamic school) and if a person is not fit for a challenging job then let him/her do a PhD degree". </blockquote>
<br />
We need our top brass in Islamic schools and likewise, we need our top brass in PhD programs, but so far our collective efforts as a nation is less than what is sufficient.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgc9-QxJ-uGk5L6lPijjkOfIDWh3-brfjAvoHzrZ4V0ou1WC3Q2t6f5_g_4_YalTfJD_Eyj5mUyJqgl_nhCacURXv17tDo6mvRDUzCHi5BHNAO2iML0Dl2SvyPrTzv4of9L7QqQrg0NUl3l/s1600/progress-01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgc9-QxJ-uGk5L6lPijjkOfIDWh3-brfjAvoHzrZ4V0ou1WC3Q2t6f5_g_4_YalTfJD_Eyj5mUyJqgl_nhCacURXv17tDo6mvRDUzCHi5BHNAO2iML0Dl2SvyPrTzv4of9L7QqQrg0NUl3l/s1600/progress-01.png" /></a></div>
</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com4tag:blogger.com,1999:blog-943647122747979667.post-46662241558222929872014-05-11T16:13:00.000-07:002014-05-11T16:57:38.176-07:00[SIGIR2014 Demo] A System-Oriented View Towards Bias in Search Process: Visualizing Perspectives in News<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
The notion of <b><i>"bias"</i></b> in search results has been investigated by the information retrieval research community. However, so far all investigations seem to take a user-oriented view of "bias" when considering the search process i.e., the user's tendency to click on results that are highly positioned within a search engine's ranked list, or the user's tendency to click on results that have more query terms in the search result title or summary. In the <a href="http://www3.it.nuigalway.ie/cirg/papers/de25-qureshi.pdf">proposed demonstration accepted at SIGIR 2014</a>, we take a system-oriented approach towards <i><b>"bias"</b></i> within the search process and offer a new interface for users to investigate the<i><b> "perspective biases"</b></i> in documents returned by a search engine.</div>
<div style="text-align: justify;">
To clearly illustrate what we mean by <i><b>"perspective bias"</b></i>, we have focused on the news domain where the inherent bias lies for the most part within the news collection itself such as news web sites having a "leftist" or "rightist" agenda. Consider a case in which a user wishes to find information about a certain event (say, a bomb blast in a certain region). The search results returned may be polarized instead of focusing on factual aspects i.e., relating to a certain race, ethnicity, or political movement which caused violence. This can prompt a user to explicitly evaluate a move from objective factual reporting to subjective reporting within the top results and this is where perspective-aware search comes to the rescue as shown in the figure below. Here, the user is asked to input a normal search query and a perspective allowing the user to highlight the presence of a perspective in the search results. <br />
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNOIBN2VIS7SJOx8ZLvYfJl8fAL-EjfMzNVKjFazkd_B8nfnUL-gBsXXnNrvcgzObvwOpx_hoAXQYHmBE_dr80Jog6pCCyxSL8rDBFdLDTP67wq_z93tjzpB_fjhFEHecCgNWlvLeasceR/s1600/perspective_search_prompt.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNOIBN2VIS7SJOx8ZLvYfJl8fAL-EjfMzNVKjFazkd_B8nfnUL-gBsXXnNrvcgzObvwOpx_hoAXQYHmBE_dr80Jog6pCCyxSL8rDBFdLDTP67wq_z93tjzpB_fjhFEHecCgNWlvLeasceR/s1600/perspective_search_prompt.png" height="110" width="400" /></a></div>
<br /></div>
For the purpose of demonstration, the system returns the top 10 news stories for the query from Bing, Yahoo and Google and then calculates a perspective score for each result while at the same time using graph visualizations to illustrate the perspective scores for each news source and each search engine.<br />
<br />
Below is a video demonstration of the "perspective-aware search system". I will be attending SIGIR 2014 in Gold Coast, Australia and for that I owe a special thanks to SIGIR Travel Grants Committee who has funded my travel to SIGIR 2014. See you Information Retrieval folks in Australia where I will be available to explain more aspects of this novel search interface.<br />
<br />
<center>
<iframe allowfullscreen="" frameborder="0" height="315" src="//www.youtube.com/embed/mPO763z6H4Y" width="560"></iframe>
</center>
</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-64302260280378491692013-08-18T17:22:00.001-07:002013-08-20T13:19:09.548-07:00User-Defined Query Term Weighting in Lucene<div dir="ltr" style="text-align: left;" trbidi="on">
I am sharing a simple code with explanation on how Lucene (pyLucene to be specific) can be used for Query Expansion.<br />
<br />
What I will not discuss here is how to devise a strategy for finding new terms for Query Expansion (a person can implement this on his/her own). But what I will explain here is, how one can assign different weights to query terms for retrieval task.<br />
<br />
Consider four documents having following content<br />
D1 -> 'pagerank pagerank algorithm'<br />
D2 -> 'pagerank algorithm algorithm',<br />
D3 -> 'pagerank',<br />
D4 -> 'algorithm'<br />
<br />
It implies our vocabulary of corpus is just 'pagerank' and 'algorithm', while corpus frequency of each term is 4 and document frequency is 3. Hence now idf and cf does not influence the scoring technique.<br />
<br />
In the attached source code you can see that we have boosted term 'pagerank' by 10 times compared to term 'algorithm'. The query is 'pagerank algorithm'.<br />
<br />
Upon retrieving the document D3 has 10 times higher score than D4 and likewise D1 has higher score than D2 (but not 10 times since the document's total terms are 3 which influences the scoring unlike in previously discussed case). Please run the source code and observe the results.<br />
<br />
Source code: <a href="http://codeviewer.org/view/code:35db">http://codeviewer.org/view/code:35db</a><br />
<br />
Version: pylucene-3.6</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-64878802851913193812013-07-27T11:57:00.002-07:002015-02-11T14:50:09.824-08:00EuroHCIR2013 Work Towards a New Search Interface namely Perspective-Aware Search<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
<div dir="ltr" style="text-align: left;" trbidi="on">
Recently an updated version was demoed in SIGIR 2014: <a href="http://dl.acm.org/citation.cfm?id=2611184">http://dl.acm.org/citation.cfm?id=2611184</a> <br />
<br />
There are occasions when search results do not satisfy the information need and give a completely undesirable set of results than what the user is looking for. A possible reason for this lies inside the returned documents which contain some perspectives while giving coverage to the topic and this perspective may be observed as bias by the user.</div>
<div dir="ltr" style="text-align: left;" trbidi="on">
Lets take the following example scenarios:<br />
<ul style="text-align: left;">
<li>Consider a case where a user wishes to find information about a certain event (say, a bomb attack in a certain region). The search results returned, contain a majority of news reports blaming Islam (its implicit writing style) relating it with terrorism in most of the cases. This prompts the user to explicitly observe how much Islam is related with terrorism in the returned set of search results. </li>
<li>Consider another case where a user wishes to find information about roles and rights of women in Islam but the search engine returns articles that contain a tendency of highlighting oppression against women instead of women rights and roles. In this case the user observes a correlation between women and oppression instead of factual position on rights. </li>
</ul>
In the above cases, the user's information need may lead him towards an explicit investigation of the underlying document collection and he/she may be interested in observing the amount of perspective tendencies in various search results (e.g., news reports). Current search engines do not facilitate this need by highlighting perspectives while displaying the search results. Hence, we propose the concept of <i><b>"perspective-aware search."</b> </i>The proposed search interface enables the user to explicitly analyze search results with a touch of perspective awareness.<br />
<br />
The following presentation contains some screen-shots of the proposed search interface; I will be giving a demo of this system at EuroHCIR Workshop that is co-located with SIGIR2013.<br />
<br /></div>
<iframe frameborder="0" height="400" src="http://prezi.com/embed/vkht44ust5xv/?bgcolor=ffffff&lock_to_path=0&autoplay=0&autohide_ctrls=0&features=undefined&disabled_features=undefined" width="550"></iframe></div>
<br />
The system is built on top of the <a href="http://www3.it.nuigalway.ie/cirg/prj/WikiMadeEasy.html">WikiMadeEasy API</a> which is an API for mining Wikipedia data and is the output of work I am doing towards my PhD thesis. Feel free to contact me for more details of the API. The full paper describing the system can be found <a href="http://websci.iba.edu.pk/Research/Papers/eurohcir2013_submission.pdf">here</a>. </div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-73885519836812980042013-02-05T07:19:00.002-08:002013-02-05T08:21:45.339-08:00Python: Reading large bz2 file with bz2.BZ2File()<div dir="ltr" style="text-align: left;" trbidi="on">
There might arise a problem of partial (incomplete) reading of a file while reading a bz2 file in python.<br />
<br />
The tip to overcome such a problem is very simple, uncompress the bz2 file using extraction utility (Ubuntu has the graphical utility by default). Once extracted, zip it back as bz2 and now try reading it again, this time you may have solved the problem.<br />
<br />
Reason for the problem: the side that produced the bz2 file may have produced the bz2 file from multiple files which is not well recognized by bz2.BZ2File() functionality in python.</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com1tag:blogger.com,1999:blog-943647122747979667.post-80410049032110387732012-04-25T08:56:00.000-07:002012-04-25T09:56:59.580-07:00WWW2012: Solving the Media Crisis<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
Last week marked a significant period for the Internet as<a href="http://www2012.wwwconference.org/"> Lyon in France hosted the famous scientific conference namely World Wide Web (WWW)</a> and was declared the World's Web capital for that week. Few people know that this is the same venue where Google was born as this is where Google co-founders Larry Page and Sergey Brin first <a href="http://ilpubs.stanford.edu:8090/361/">presented their PageRank algorithm</a>. This year's WWW conference was highly political in nature mainly due to the changing nature of today's Web and its significant role in major events all over the world. Many sessions focused on how the <a href="http://blogs.tribune.com.pk/story/9920/why-comments-and-likes-matter-in-the-new-media-world/">voices over new media (i.e. social media) is affecting the traditional media</a> both in terms of new paradigms of news dissemination and the credibility of that news. In particular one panel named <a href="http://smane2012.socialsensor.eu/">"Social Media Applications in News and Entertainment"</a> comprised of journalists and media persons from famous media outlets of Europe (BBC, Deutsche Welle and AFP- Agence France-Press). Twitter was the top platform of choice for all the journalists on account of its speed and ease-of-use. However, all the journalists in the panel pointed to one thing namely the credibility of news on social media and methods for its verification. They pointed to some examples of false happenings disseminated through social media platforms. Denis Teyssou of AFP told the audience about how a fake picture of Osama bin Laden's death revolved around social media in 2007; AFP then used a computational graphics software to detect that the picture was fake and the news was false. A similar problem arose during the 2010 Haiti earthquake <a href="http://www.smh.com.au/digital-life/digital-life-news/bloggers-jump-gun-with-wrong-photos-20100114-ma7x.html">when a picture of 2008 Sichuan earthquake was included in a slideshow of 54 images from Haiti by Daily News</a>. The photo was traced out to a random social media activist and Daily News was heavily criticized by social media and traditional media circles alike. The point in focus is the huge challenge traditional media outlets face due to faster means of news dissemination: the problem is huge, the criticism from various circles is immense and the time is crucial. Even the recent Bhoja Air tragedy and <a href="http://blogs.tribune.com.pk/story/11295/bhoja-air-crash-where-were-our-media-ethics/">corresponding burst of criticism on the traditional media by various social media activist</a>s points out the need for a solution to this traditional vs. social media crisis. We all speak of problems and shortcomings in various aspects of the media but when it comes to solutions, we are somewhat clueless and this for the better has to change.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqhMOsAwDfDngO_dz1Tf_xyOm4dB8q1qdfjQrdF-reD7I1y_Khiy8IR60eeh88gZm5K5hpGTMEq_yP1kBnzJE5tcuC2jeIesqtBEBbp6LBeaLiWT6HEu-OQukEyRlCpUKT0g9W3sL2oDp1/s1600/img_0013.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgqhMOsAwDfDngO_dz1Tf_xyOm4dB8q1qdfjQrdF-reD7I1y_Khiy8IR60eeh88gZm5K5hpGTMEq_yP1kBnzJE5tcuC2jeIesqtBEBbp6LBeaLiWT6HEu-OQukEyRlCpUKT0g9W3sL2oDp1/s320/img_0013.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Osama bin Laden fake death picture sent to AFP by a random social media activist</td></tr>
</tbody></table>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<table cellpadding="0" cellspacing="0" class="tr-caption-container" style="float: right; margin-left: 1em; text-align: right;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-uV9xC0kbbHtqSxrdry0wGDDTKixfxZNnaxTi5qMGayjMz0XVYvoA7zDIEX93wmqNUiZQDUWZoq1PTOD4HiJBnvNT9sxIunB2GuUOECTEI8WR8ZNOyMH8Or_u34VrQx4mTLChRjBsN9cw/s1600/img_0014.jpg" imageanchor="1" style="clear: right; margin-bottom: 1em; margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-uV9xC0kbbHtqSxrdry0wGDDTKixfxZNnaxTi5qMGayjMz0XVYvoA7zDIEX93wmqNUiZQDUWZoq1PTOD4HiJBnvNT9sxIunB2GuUOECTEI8WR8ZNOyMH8Or_u34VrQx4mTLChRjBsN9cw/s320/img_0014.jpg" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Siuchan Eartquake picture that was labeled to be Haiti Earthquake picture<br />
by random social media activist</td></tr>
</tbody></table>
<br />
Being a Computer Scientist and having an analytical eye for things makes one approach problems differently: this was the key-point highlighted by all journalists during WWW2012 and Computer Science is the field where the journalists are turing for solutions. Most of them admitted that the advent of citizen journalism has left many in the media industry baffled and some even outraged as journalists by nature are arrogant people not having a habit of public engagement. Social media has considerably changed this leading to more and more public-media engagement but crucial problems still remain; it is the solution to these problems that will bridge the trust gap between the public and the media.<br />
<br />
<br />
As part of a panel on Social Computing and Social Machines: A Research Agenda in Web Science track of WWW2012 I could relate to many of the concerns raised by the journalists and interaction with various panelists further strengthened my hypothesis on marriage between Computer Science and journalism. A similar belief about the way forward is held by sociologists and psychologists with <a href="http://www.guardian.co.uk/news/datablog/2011/jul/28/data-journalism">data journalism seeming to be the way forward for all of us</a>. As is the case with most analytic fields not many in the Pakistani media industry are familiar with data journalism and lack far behind than in this area unlike the well-established names such as Guardian, Huffington Post, and BBC. It could very well be that the solution to the media crisis lies right before us and we in our arrogance or let's put it softly ignorance are not turning towards it. Both the traditional media and social media circles have their own sets of problems and the way I see it collaboration can help solve the problems of both of them, and Computer Science is a good means of attaining that goal. Problems such as this is what has given birth to the new field of Web Science and this emerging field of Web Science aims to bring together scientists from various disciplines as the Web Science diagram shows.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1QznW61ua5KXpauzKBfnWv0ctfGPCzXDZZTWvGiY9uGhZd83aZuhOu8CDfstZK6SsMu9aJufLx_KAw8T80FUB06RH4tf7mBphRGjLQGo49pgi-lJxeqQVULKtC2IH8dWFcvS9DfMVYwFm/s1600/websicience.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="290" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1QznW61ua5KXpauzKBfnWv0ctfGPCzXDZZTWvGiY9uGhZd83aZuhOu8CDfstZK6SsMu9aJufLx_KAw8T80FUB06RH4tf7mBphRGjLQGo49pgi-lJxeqQVULKtC2IH8dWFcvS9DfMVYwFm/s400/websicience.jpg" width="400" /></a></div>
<br />
Being a programmer from quite some time now and having been into the Computer Science research arena for about 3-4 years I along with my colleague and wife <a href="http://arjumand-atif.blogspot.com/">Arjumand Younus</a> have set up a small Web Science research group at Institute of Business Administration in Karachi, Pakistan and this year our biggest achievement of so far came in the form of two research works by WebST IBA at WWW2012. Both of our works focus towards the media crisis. The first one titled <a href="http://investigating%20bias%20in%20traditional%20media%20through%20social%20media/">"Investigating Bias in Traditional Media through Social Media"</a> quantitatively studies bias in traditional media platforms through the use of social media text-mining and utilizes various text-mining similarity measures to identify differences in how both platforms report a news event. The second one titled "<a href="http://www2012.wwwconference.org/proceedings/webscience/wwwwebsci2012_qureshi.pdf">Traces of Social Media Activism from Malaysia and Pakistan"</a> studies regional differences in activity of social media activists from two regions namely Malaysia and Pakistan and reports on some significant findings. We came to the conclusion that social media activists in Pakistan tend to do a large amount of mentions and retweets thereby endorsing each other's statements whereas Malaysian activists tend to post status updates more frequently. The research includes many other findings, feel free to contact us at the <a href="http://websci.iba.edu.pk/">Web Science lab of IBA</a> for further insights or if you are interested in joining any of these projects.</div>Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-64446362033678818832011-08-21T11:43:00.000-07:002011-08-21T12:18:05.549-07:00My visit to Russia, RuSSIR/EDBT 2011<div>Past week of my life was a very exciting experience (14 Aug.- 20 Aug. 2011), I along with my wife and colleague <a href="http://arjumand-atif.blogspot.com/">Arjumand</a> went to St. Petersburg (Russia) for RuSSIR/EDBT summer school and International Conference centered around Information Retrieval (Computer Science). This was my first time ever to visit Russia and I was little confused about how it would be (as very little is known about Russia in Pakistan). Reaching there after long stretched hours of flight consumed me a lot in terms of energy. However, as soon I arrived I found my surroundings very normal and helpful as it was in South Korea, China and Malaysia (my previous experiences); everything looked usual and cooperative. Team of RuSSIR/EDBT arranged a nice hostel for the two of us which was in fact a studio apartment and better than where I had ever stayed previously during my short trips. Russian people could barely talk in English but were polite and helpful (thumbs up!).</div><div>
<br /></div><div>On 15 Aug. we went to the school venue for our educational commitment through a bus followed by metro (simply subway or rail) and then a short walk to the campus of St. Petersburg State university. We faced little confusion in reaching our destination but managed to reach through words of passing by people. In the gathering, there were fine lectures for full five days as part of summer school and nice papers and posters during the conference. Our daily routine from 15-19 Aug. had a duration from 9:00 AM till 9:00 PM and every second of it paid off in terms of learning. In breaks we had chances to socialize with people hence forming future possible collaborative/cooperative connections. Suggestions and fine word of questions led us to have a feedback which could not be possible to generate within a week (usually). During the gathering, we witnessed people from social sciences taking keen interest in aspects of social networks which shows how aware they were about fields getting merged. A guy from industry, Yandex (similar to Google in Russia) came to me after reading my research paper for interacting and this shows how practical and serious things were there. Similar meaningful was my interaction with guy from mail.ru (another giant in Russian industry). All in all it was a very nice experience and I feel like typing in something for it as a token of appreciation.</div><div>
<br /></div><div>I close my post by saying thanks to the team of RuSSIR/EDBT for organzing such an event and I also want to thank to the participants who put further five stars to the gathering.</div><div>
<br /></div><div>As a note for Pakistani people who know a very little about Russia: it is a nice place with nice people and very open place for knowledge cooperation and collaborations, do consider it as a meaningful option for academics and industry. Further, feel free to write back to me.</div><div>
<br /></div><div>RuSSIR/EDBT 2011 link: <a href="http://romip.ru/edbt-russir2011/">http://romip.ru/edbt-russir2011/</a></div>Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com2tag:blogger.com,1999:blog-943647122747979667.post-11118485893891825042011-03-16T01:07:00.000-07:002011-03-27T17:20:48.352-07:00WWW2011: Fellowship Achieved, Research Paper Approved But Indian VISA DeniedI had planned to attend <a href="http://www.www2011india.com/">WWW 2011</a> this year which begins today. Every researcher working on the Web and its related technologies knows that WWW is the greatest academic conference for Web researches with an acceptance rate as low as below 20% . It is the same conference where Google founders Larry Page and Sergey Brin presented their famous PageRank algorithm. The venue for this year's WWW Hyderabad, India was particularly exciting for me as I always had a wish to visit my neighboring country India and this year I felt it's a natural chance for me. <div><br /></div><div>WWW 2011 is being organized by IIIT Bangalore and they also announced few fellowships for attending the conference. I applied for the NIXI fellowship which was a fellowship program aimed at encouraging faculty and students from under-represented parts of the world, e.g. developing countries and poor regions of developed countries, etc. to have representation in this year's WWW as the theme in particular was "Web for All". Hence, in line with the theme preference for NIXI fellowship was given to those who had not attended a WWW conference in the past, who were from a region not already having considerable presence at WWW, and who had a demonstrated interest in web and its technologies. Based on my publications related to the Web and my credentials in this field, I got the NIXI fellowship. I along with my wife were the only Pakistanis attending WWW 2011, and this to the best of my knowledge is the first ever representation of Pakistanis in a WWW conference.</div><div><br /></div><div>Then began the process of applying for Indian VISA which turned out to be the bottleneck. I was short on time and Indian VISA application center in Seoul informed me that the process for Pakistani passport holders takes a month. I then approached the Indian Embassy in Seoul and I found them to extremely co-operative, they called me to their office in Seoul and were ready to give me special consideration on account of my research profile. They said they could give me VISA in one day if I showed them clearance from Ministry of Home Affairs, India which happens to be the main requirement for getting conference VISA for India. And then began a huge round of email exchanges with the WWW Secretratriat who happened to be from IIIT Bangalore and was highly, highly co-operative throughout. Despite their tremendous efforts the Ministry of Home Affairs (MHA), India refused to give me clearance without stating any reason for the denied clearance.</div><div><br /></div><div>IIIT Bangalore and their staff did all what they could (they were simply great people to interact with). However MHA stood against values of science and they were unable to understand importance of scientific merit. Indian leading school would learn a lot from this, MHA is shutting off doors of science (without even bothering for any sort of explanation). I was invited there with delegate status, and even then MHA ignored seemingly because of carrying Pakistani Passport with status on merit i.e., got a research paper approved and got the fellowship. If such profiles are not invited then who else could they invite naturally and intellectually. I believe there is a lesson to learn on both sides (specially for intellectuals who could turn things by using intellect). MHA has no idea, they are killing science and I am not the only one who is denied, one German scientist was asked for Birth certificate (God knows what on earth they are thinking) and a Russian scientists was denied VISA due to her passport expiry period within 3 months (however, she is renowned name in the field and co-founder of famous <a href="http://tweetedtimes.com/">www.tweetedtimes.com</a>).</div><div><br /></div><div>So today, WWW 2011 begins in Hyderabad, India. I had thought of writing a blog post covering the various sessions of the conference but it was not to be. But despite all that I wish all the best from my side to WWW 2011 organizers and attendees; would love to see the tweets coming from there: #www2011. I have already read some of the WWW 2011 papers and seems to be a pretty exciting conference this year.</div>Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com2tag:blogger.com,1999:blog-943647122747979667.post-1207283479654336542011-02-14T04:19:00.000-08:002011-02-14T04:43:55.469-08:00The Yellow Banded GraduatesThese were my feelings on the day of KAIST Graduation Ceremony 2011 i.e, 11th February, 2011 . I have managed the time to post this on my blog today. Here are my feelings for that day:<br /><br />Finally, the long awaited day has arrived, the tradition of passing on the light at KAIST. The difference this time, me and my wife get two degrees of MS by Research as only Pakistanis in Computer Science in world's 21st (this year it's 24th) ranked university under Technology (simply, Asian MIT). With a lot of hopes and a mix of emotions, the yellow banded graduates head to the ceremony. This is a product of hard work of everyone associated with us (in terms of academics, advices, emotions, etc.,) and importantly a responsibility to pay back to the seed land, Pakistan. Never felt but the truth is, every single degree counts for Pakistan (part of Muslim world). Hopes went on to height as we went on to recieve something which looked impossible if life goes into flash back of more than two and a half years with no money and no sense of how to pursue higher education in ever challenging field of Computer Science. I can't mention everyone who stood at our backs in the tough times, but one person I can never forget to mention is my mother as she is the lady who turned me into what I am and led me after passing of my father when I had just passed 7 grades of basic schooling. My mother made me look to the world through her eyes and turned me into what made me a yellow banded KAISTIAN, she completed her PhD after the passing of my father and taught me by her living example before me to target higher than the known limits of the world, a lady who not only took great care of her children but also attained biggest pride in education (i.e., PhD) after passing of her life partner and my father. This by example, shaped me well and I am taking not one degree but two, one of mine and other of my wife.<br /><br />Having said some of my background now comes some snaps that now become part of golden memories :)<div><br /></div><div><br /></div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJIxJOlkGwyjip148pDZM5F2WmekXLOU-is8DFE-VxCFrKhZkPu54ruc7LOUQO0gGnvhcbmUKSx0go1GvmktemQMmPnoOzZjv_OExbl8rPneixbCkNtOHGfufxe9nOZAwl6ikNSdhrgazt/s1600/IMG_2167.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJIxJOlkGwyjip148pDZM5F2WmekXLOU-is8DFE-VxCFrKhZkPu54ruc7LOUQO0gGnvhcbmUKSx0go1GvmktemQMmPnoOzZjv_OExbl8rPneixbCkNtOHGfufxe9nOZAwl6ikNSdhrgazt/s400/IMG_2167.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5573519624501722722" /></a><br /><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2mjlWsiOJH5206h0JX2LfKBK1cCAJPt-IYjzUaPsCqvbRe7M1OS6qyRSIOTMAgLHwaDue3dBXMhWORFK80cvcplbUHUacfm2skS0UqcuquurQx7sa4xR7I8cVMeLFKeJWReFI1zYKp99N/s1600/IMG_2179.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 300px; height: 400px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi2mjlWsiOJH5206h0JX2LfKBK1cCAJPt-IYjzUaPsCqvbRe7M1OS6qyRSIOTMAgLHwaDue3dBXMhWORFK80cvcplbUHUacfm2skS0UqcuquurQx7sa4xR7I8cVMeLFKeJWReFI1zYKp99N/s400/IMG_2179.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5573520241271143346" /></a>Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com6tag:blogger.com,1999:blog-943647122747979667.post-10367529003892050732011-01-15T03:33:00.000-08:002013-12-05T15:46:04.027-08:00Do's and Don'ts: Organization of Research Talk<div dir="ltr" style="text-align: left;" trbidi="on">
<div>
<blockquote style="font-family: arial; font-size: 10pt;">
</blockquote>
<blockquote style="font-family: arial; font-size: 10pt;">
</blockquote>
<blockquote style="font-family: arial; font-size: 10pt;">
</blockquote>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span class="Apple-style-span">I have been asked this question (i.e., how to organize research talk through slides) by people from sometime and I have been replying them in private communication (i.e., over chat, emails and other means). However, now I have decided to write a blog post on this subject for greater audience.</span></div>
<div style="text-align: justify;">
<span class="Apple-style-span"><br /></span></div>
<div style="text-align: justify;">
<span class="Apple-style-span">Actually, there is no perfect answer to it but there are few tips which may be meaningful and applicable after some adjustment per area of research (it also depends upon maturity of research which is being presented).</span></div>
<div style="text-align: justify;">
<span class="Apple-style-span"><br /></span></div>
<div style="text-align: justify;">
<span class="Apple-style-span">Some tips regarding a research talk (presented by slides) with do's and don'ts are as follows.</span></div>
</div>
<div style="font-size: 10pt; text-align: justify;">
<span class="Apple-style-span"><br /></span></div>
<div style="text-align: justify;">
<b><span class="Apple-style-span">Don'ts</span></b></div>
<div style="font-size: 10pt;">
<ol>
<li style="text-align: justify;"><span class="Apple-style-span">Never paste a flat abstract in your slides ever, because no body is ever entitled to read it like a paper during your talk.</span></li>
<li style="text-align: justify;"><span class="Apple-style-span">Never copy sentences, instead try to phrase it as concepts because slide is not a collection of flat sentences. It is expected to be organization of concepts, nobody would read it like a book in your talk (neither they have time nor it is expected from a talk). </span></li>
<li style="text-align: justify;"><span class="Apple-style-span">Never put other's content without a reference. It would be better if you put references by hinting year of publication such as BP98 or YZ10 so that viewer can understand how latest is that work. References [1], [2] etc is not helpful for talk. Remember talk is different from book or paper.</span></li>
<li style="text-align: justify;"><span class="Apple-style-span">Never fill too much space with content in a single slide, thats not understandable during talk (no body reads that much in one slide)</span></li>
<li style="text-align: justify;"><span class="Apple-style-span" style="font-family: 'trebuchet ms';">Never discuss a concept to such depths that it takes more than a minute (approx.), slide is not for discussing concepts in details.</span></li>
<li style="text-align: justify;"><span class="Apple-style-span">Never explain a concept by talk without inserting it into your slide. People follow your talk by looking to your slides.</span></li>
<li style="text-align: justify;"><span class="Apple-style-span" style="font-family: 'trebuchet ms';">Never use bullet or numbered list if there is no other parallel concept to that. Definition of apple is neither a bullet nor a numbered list.</span></li>
</ol>
</div>
<div style="font-size: 10pt; text-align: justify;">
<span class="Apple-style-span"><br /></span></div>
<div style="text-align: justify;">
<b><span class="Apple-style-span">Do's</span></b></div>
<div>
<ol>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;">Always put slide numbers so that people could refer to page number later for questioning.</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always put contents in terms of conceptual hierarchy.</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always put two parallel concepts at same level of bullets. Apple is parallel bullet to orange if focus is fruits.</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always put sub-concepts as sub-bullets as per their logical associativity. Apple may have sub-bullets such as green apple and red apple.</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always use numbered lists if concepts are ordered otherwise use bullets. Algorithm is sequential flow so use numbered list.</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always remember every conceptual abstraction should be completely meaningful such as overview and observation should be 100% mappable to concepts discussed inside.</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always use "(1/x)" style (or similar) for concepts that expands on more than one slide, where x represents maximum number of pages till which the concept is expanded.</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always put supplementary notes at the end of slides, such that if a question comes little off from slides then you can take help from supplementary slides (if needed).</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always use consistent conventions such as Fig. or Figure. Consistent means, Figure and Fig. both cannot be use in complete slide.</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always read your slides after preparing it completely. Take a print out and carefully examine it, you will generally find some errors in it.</span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;">Always, begin your slides, by exposing the contents of your talk. Main contents should be very general: no unknown term that requires explanation could be your main heading of your contents except title of your thesis. Following is one good example.</span></li>
</ol>
</div>
<blockquote>
<div>
<ol>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span"><b>Introduction</b>: It covers background work i.e., basic definition of objects which serves valid for your work (preliminary definitions) and then make connections among discussed objects in a story line as evolution of your focused area. Finally, link your problem statement (as natural evolution to the subject). Motivation and Goal is extremely important in this part followed by contribution of your work (which is to be explained in coming parts of your talk).</span></span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;"><b>Related Work</b>: In this part, you are required to present previous related work in your area of research. You are expected to defend your idea against presented related works in later part (i.e., experimental part) in order to compare your results (i.e., effectiveness of your work). If not all then some related works are extremely important for comparing from your work.</span></span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span"><span class="Apple-style-span" style="font-size: small;"></span><span class="Apple-style-span" style="font-size: small;"><b>Your Work or Title borrowed from Thesis</b>: In this part, you are expected to explain your idea and link all the concepts together as if they were pieces of puzzle fitting together. This is your part of talk i.e., fresh and new from other works.</span></span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span"><span class="Apple-style-span" style="font-size: small;"><span class="Apple-style-span" style="font-size: 16px;"><b style="font-size: small;">Evaluation of your work/Experimental Evidence</b><span class="Apple-style-span">: </span></span>In this part, you are required to show the comparison of your work with other established works in order to justify your contribution.</span></span></li>
<li style="font-size: 10pt; text-align: justify;"><span class="Apple-style-span"><span class="Apple-style-span" style="font-size: small;"></span></span><span class="Apple-style-span" style="font-size: small;"><b>Conclusion</b>: In last, you are supposed to summarize your findings.</span></li>
</ol>
</div>
<div style="font-size: medium;">
</div>
</blockquote>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span class="Apple-style-span">Following is the link of my defense of Masters by Research (which took place on 16 Dec. 2010). You can take some help from it as sample template (i.e., how a talk could be organized)</span></div>
<div style="text-align: justify;">
<span class="Apple-style-span"><br /></span></div>
<div style="text-align: justify;">
<a href="http://randomcsthoughts.blogspot.com/2011/01/masters-thesis-improving-quality-of-web.html"><span class="Apple-style-span">http://randomcsthoughts.blogspot.com/2011/01/masters-thesis-improving-quality-of-web.html</span></a></div>
</div>
<div>
<div style="font-size: 10pt; text-align: justify;">
<br /></div>
</div>
</div>
<div style="text-align: justify;">
<div style="text-align: justify;">
<span class="Apple-style-span">I also wrote a post as a small talk on "writing" which may be useful for understanding the common problems while transferring ideas through writing.</span></div>
<div style="text-align: justify;">
<span class="Apple-style-span"><br /></span></div>
<div style="text-align: justify;">
<span class="Apple-style-span"><a href="http://randomcsthoughts.blogspot.com/2010/11/why-writing-is-important.html">http://randomcsthoughts.blogspot.com/2010/11/why-writing-is-important.html</a></span></div>
<div style="text-align: justify;">
<span class="Apple-style-span"><br /></span></div>
<div style="text-align: justify;">
<span class="Apple-style-span">Maintaining above practice for preparing slides would provide you inside worth of your own research work i.e., you can judge where you stand in terms of actual worth of your own work (approx.), therefore by preparing good slides one can always help himself or herself before anyone else. As a researcher, you are expected to be clear in transferring your ideas by speech. Making good slides is a science which must be mastered by a researcher.</span></div>
<div style="text-align: justify;">
<span class="Apple-style-span"><br /></span></div>
<div style="text-align: justify;">
<span class="Apple-style-span">If you find this post interesting please share it with your friends. By sharing it with people it will contribute either as feedbacks (correcting, adding, questioning etc.,) for me or some knowledge for other people. Always help the world, so that you could help most in return through the cycle.</span></div>
</div>
</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com4tag:blogger.com,1999:blog-943647122747979667.post-75918591313805824742011-01-07T09:57:00.001-08:002015-06-09T15:39:38.234-07:00Master's Thesis: Improving the Quality of Web Spam Filtering by Using Seed Refinement<div dir="ltr" style="text-align: left;" trbidi="on">
I finally defended my Master's thesis titled "<b>Improving the Quality of Web Spam Filtering by Using Seed Refinement</b>"on 16th December, 2010. The thesis deals with a significant problem from the viewpoint of today's World Wide Web namely Web spam which has become a nuisance for search engines today.<br />
<div>
<br /></div>
<div>
The thesis proposes seed refinement techniques for four well-known web spam filtering algorithms: <a href="http://en.wikipedia.org/wiki/TrustRank">TrustRank</a>, <a href="http://airweb.cse.lehigh.edu/2006/krishnan.pdf">Anti-TrustRank</a>, <a href="http://en.wikipedia.org/wiki/Spam_mass">Spam Mass</a>, and <a href="http://www.citeulike.org/user/MaineC/article/336502">Link Farm Spam</a>. The input seed is refined by maintaining an exception list in the input seed set. This proves to be helpful in decreasing false positives while increasing true positives. Additionally, in this thesis, a strategy for the succession of the modified algorithms is also proposed. These are classified into two classes: a seed refiner followed by a spam detector. Modified TrustRank (<i>MTR</i>) and Modified Anti-TrustRank (<i>MATR</i>) which are seed refiners while Modified Spam Mass (<i>MSM</i>) and Modified Link Farm Spam (<i>MLFS</i>) which are spam detectors. </div>
<div>
<br /></div>
<div>
Following is my Master's thesis defense presentation which I am sharing for those interested:</div>
<div>
<br /></div>
<br />
<br />
<center>
<iframe allowfullscreen="" frameborder="0" height="355" marginheight="0" marginwidth="0" scrolling="no" src="//www.slideshare.net/slideshow/embed_code/key/dKAk7TX6Y9gJbp" style="border-width: 1px; border: 1px solid #CCC; margin-bottom: 5px; max-width: 100%;" width="425"> </iframe> <div style="margin-bottom: 5px;">
<b> <a href="https://www.slideshare.net/MAtifQureshi/masters-thesis-defense-improving-the-quality-of-web-spam-filtering-by-using-seed-refinement" target="_blank" title="Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using Seed Refinement">Master's Thesis Defense: Improving the Quality of Web Spam Filtering by Using Seed Refinement</a> </b> from <b><a href="https://www.slideshare.net/MAtifQureshi" target="_blank">M Atif Qureshi</a></b> </div>
</center>
<br />
<br />
The full-text of the thesis can be downloaded from <a href="https://drive.google.com/file/d/0B12fqNLbk48zR0ZOUzNEWXpPR2c/view?usp=sharing">here</a> or <a href="http://unhp.com.pk/papers/Improving_the_Quality_of_Web_Spam_Filtering_by_Using_Seed_Refinement.pdf">here</a> (the journal copy). Interested students/researchers may contact me for any questions, comments and feedback. The full-text of the thesis can also be requested via email.<br />
<br />
For Citation:<br />
<pre style="margin-bottom: 0px; margin-top: 0px;">@article{qureshi2011,</pre>
<pre style="margin-bottom: 0px; margin-top: 0px;"><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">title={Improving</span><span style="color: #9a4d00;"> the Quality of Web Spam Filtering by Using Seed Refinement},</span></pre>
<pre style="margin-bottom: 0px; margin-top: 0px;"><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">author={Qureshi,</span><span style="color: #9a4d00;"> Muhammad </span><span style="color: #9a4d00;">Atif</span><span style="color: #9a4d00;">; </span><span style="color: #9a4d00;">Yun,</span><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">Tae-Seob</span><span style="color: #9a4d00;">; </span><span style="color: #9a4d00;">Lee,</span><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">Jeong-Hoon</span><span style="color: #9a4d00;">; </span><span style="color: #9a4d00;">Whang,</span><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">Kyu-Young</span><span style="color: #9a4d00;">},</span></pre>
<pre style="margin-bottom: 0px; margin-top: 0px;"><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">journal={Journal</span><span style="color: #9a4d00;"> of the Institute of Electronics Engineers of Korea},</span></pre>
<pre style="margin-bottom: 0px; margin-top: 0px;"><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">volume={48</span><span style="color: #9a4d00;">},</span></pre>
<pre style="margin-bottom: 0px; margin-top: 0px;"><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">number={6</span><span style="color: #9a4d00;">},</span></pre>
<pre style="margin-bottom: 0px; margin-top: 0px;"><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">pages={123-139</span><span style="color: #9a4d00;">},</span></pre>
<pre style="margin-bottom: 0px; margin-top: 0px;"><span style="color: #9a4d00;"> </span><span style="color: #9a4d00;">year={2011</span><span style="color: #9a4d00;">}</span></pre>
<pre style="margin-bottom: 0px; margin-top: 0px;">}</pre>
</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-27427177308657468232010-11-22T02:31:00.000-08:002010-11-22T02:32:42.578-08:00Why Writing is Important?<div style="text-align: justify;"><span class="Apple-style-span" >Do I need to learn how to write? What does it mean? Does it only mean I should write correctly in terms of grammar? The answer for 'only grammar' is No.</span></div><div style="text-align: justify;"><span class="Apple-style-span" ><br /></span></div><div style="text-align: justify;"><span class="Apple-style-span" >Lets go to the fundamentals, why do we write? Um... one would say, for communication. Then why do we communicate? For making others understand our viewpoint. So understanding is the most important concept.</span></div><div style="text-align: justify;"><span class="Apple-style-span" ><br /></span></div><div style="text-align: justify;"><span class="Apple-style-span" >Now, whats special with writing provided that we can talk too? The key idea is preservation of clear thoughts and that's the magic of writing. Its not important to write in perfect grammar as compared to writing clear and unambiguous(which are interpretable in one way) thoughts. Grammar is a side hero in this area but the main hero is transfer of clear thoughts.</span></div><div style="text-align: justify;"><span class="Apple-style-span" >In this article, I took grammar as a general representative of syntax (i.e., spellings, tenses etc). Some people feel that writing is about bragging their fluency in a particular language but this is one common myth.</span></div><div style="text-align: justify;"><span class="Apple-style-span" ><br /></span></div><div style="text-align: justify;"><span class="Apple-style-span" >Let me give some examples of unambiguous writing so that I can transfer my thought in a clearer way.</span></div><div style="text-align: justify;"><span class="Apple-style-span" >1- This is a sentence where a/b is true.</span></div><div style="text-align: justify;"><span class="Apple-style-span" >Ans- How do I read that? This is a sentence where a or b or both are true. Common mistake in the interpretation could be a or b (at the reader’s end or at the writer’s end or at both ends).</span></div><div style="text-align: justify;"><span class="Apple-style-span" >2- This is example: This supports the example strongly.</span></div><div style="text-align: justify;"><span class="Apple-style-span" >Ans- How do I read that? This is example and following sentence strongly supports it due to colon. Common mistake would to ignore ‘:’ by reader/writer.</span></div><div style="text-align: justify;"><span class="Apple-style-span" >3- I am writing; I am sharing myself.</span></div><div style="text-align: justify;"><span class="Apple-style-span" >Ans- How do I read that? I am writing and one reason is to share myself. Common mistake could be to ignore ‘;’ or make it strongly associated with previous sentence by reader/writer i.e., I am writing because I want to share myself.</span></div><div style="text-align: justify;"><span class="Apple-style-span" ><br /></span></div><div style="text-align: justify;"><span class="Apple-style-span" >To conclude my short post, writing is a very effective way of communication. However, it brings complications because it does not involve body language (as we have in talks). Therefore, its important to understand the idea behind this style of communication in order to avoid being mis-reported or mis-read etc. It is also an important skill for both readers and writers.</span></div><div style="text-align: justify;"><span class="Apple-style-span" ><br /></span></div><div style="text-align: justify;"><span class="Apple-style-span" >If you find this post interesting please share it with your friends. This post is both applicable for general and technical writing purposes/practices.</span></div><div><br /></div>Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-21789827588040775802010-09-14T21:46:00.000-07:002010-09-14T21:56:04.987-07:00Steps of utilization of human mind<div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Following are general steps defining the utilization of human mind in an activity.</span></div><div><ol><li style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Question your surrounding</span></li><li style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Question beyond your surrounding</span></li><li style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Think for what to adopt</span></li><li style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Maximize following of adoption</span></li></ol></div><div style="text-align: justify;"><span class="Apple-style-span" style=" ;font-family:'trebuchet ms';">If required go back to previous steps. </span></div><div style="text-align: justify;"><span class="Apple-style-span" style=" ;font-family:'trebuchet ms';"><br /></span></div><div style="text-align: justify;"><span class="Apple-style-span" style=" ;font-family:'trebuchet ms';">People may apply this to maximize anything which interests or affects them, be it education and research, paid professional research, business need, scientific advancement and religious/non-religious thinking. While the activities which do not involve above general steps fall into norms, culture and control. If an activity expects utilization of mind but above mentioned steps are not followed then the person is in illusion. Questioning always breaks illusion and borrowed questions may introduce stronger illusion; if questions are borrowed then expansion through step 2 should not be borrowed to full in order to keep it reasonably natural. A point to note here, thinking is natural but illusion is sleeping drug to our human mind. </span></div><div style="text-align: justify;"><br /></div>Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-55330842374537421212010-09-03T01:43:00.000-07:002015-06-05T14:31:28.654-07:00[Paper]: Blogosphere Topic Clustering and Ranking<div dir="ltr" style="text-align: left;" trbidi="on">
<div style="text-align: justify;">
Following are the slides of my paper presentation at <a href="http://www.ukp.tu-darmstadt.de/scientific-community/coling-2010-workshop/">COLING 2010 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources</a></div>
<div>
<br /></div>
<div>
<br /></div>
<br />
<center>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/4qxns7xXXeRYeZ" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/MAtifQureshi/identifying-and-ranking-topic-clusters-in-the-blogosphere" title="Identifying and ranking topic clusters in the blogosphere" target="_blank">Identifying and ranking topic clusters in the blogosphere</a> </strong> from <strong><a href="//www.slideshare.net/MAtifQureshi" target="_blank">M Atif Qureshi</a></strong> </div>
</center>
<br />
<br />
<div style="text-align: justify;">
The paper deals with the Identification and Ranking of Topic Clusters in the blogosphere. Topic clusters represent in this paper the concept of grouping together blogs sharing a common interest i.e. topic. The algorithm takes into account both the hyperlinked social network of blogs along with the content in the blog posts. Topic-specific ranks are assigned to each blog in the cluster using a metric called “Topic Discussion Rank,” that helps in identifying the most influential blog for a specific topic. Experiments show that the presented method reaches a high level of accuracy.</div>
<div style="text-align: justify;">
<br /></div>
<div>
<div style="text-align: justify;">
This method proposed suggests a hybrid approach: first, content words of the blog posts are used to determine the relevance of a blog for a given concept, and thus the assignment to a "cluster". Second the links originating from the blog are used to rank the relevance of the blogs within the "cluster".</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
In the age of Web 2.0 the blogosphere has assumed a very significant role and it serves as an opinion dissemination medium; as such this research is part of a long-term project on blogosphere research and we would like to invite students/researchers who are interested in this area to collaborate with us. I recommend such people to contact me through email at atifms@kaist.ac.kr or matifq@yahoo.com; or they can contact my colleague at arjumandms@kaist.ac.kr or arjumand_younus@yahoo.com. </div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
The full text of the paper can be downloaded from this link: <a href="http://www.aclweb.org/anthology/W/W10/W10-3507.pdf">paper</a>.</div>
</div>
<div style="text-align: justify;">
<br /></div>
<div style="text-align: justify;">
Additionally a discussion on the workshop: collaboratively constructed semantic resources can be found <a href="http://arjumand-atif.blogspot.com/2010/09/coling-2010-workshop-peoples-web-meets.html">here</a> and your comments are also welcome.</div>
</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-69640093471470713002010-08-31T08:33:00.000-07:002010-09-01T07:29:17.198-07:00Visit to Microsoft Research Asia: Details on Discussion27th August, 2010 was a memorable day for me as me and my colleague who also happens to be my wife had a tour of Microsoft Research Asia in Beijing, China.<div><br />Mr. <a href="http://research.microsoft.com/en-us/um/people/yucao/">Yunbo Cao</a> who works there hosted the two of us. He was an extremely humble guy and his humility and kindness was evident through the session. First we were shown around the research labs and work area; he showed us how for every employee at Microsoft Research instead of having name plates outside each employee's office they have written name of city to which that employee belongs.</div><div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiPSZWqZJOoXQb4lS-I_FBMVURgF75a6c04ozv5TJHSgZ9UZqBMMXe9vXC6VyAZsZfb9GjxzEuhkQnTRcKdsA93ixom-sI2FCTlcnZOFGVZyJW5JajIAEUpU4VA8ysW2PM_Axj7F3qCZsMQ/s1600/IMG_1978.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiPSZWqZJOoXQb4lS-I_FBMVURgF75a6c04ozv5TJHSgZ9UZqBMMXe9vXC6VyAZsZfb9GjxzEuhkQnTRcKdsA93ixom-sI2FCTlcnZOFGVZyJW5JajIAEUpU4VA8ysW2PM_Axj7F3qCZsMQ/s400/IMG_1978.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5511604153870975042" /></a><br /></div><div>Microsoft Research Asia is Microsoft’s fundamental research arm in the Asia Pacific region and it was founded on November 5, 1998. In 2004, <a href="http://www.technologyreview.com/Biztech/13616/">MIT Technology Review named Microsoft Research Asia “the hottest computer lab in the world.”</a> From Microsoft Research Asia have emerged many technologies that have had a huge impact on the technological community today. Over 200 innovations from the lab have been transferred to Microsoft products, including Office XP, Office System 2003, Windows XP, Windows Server 2003, Windows XP Media Center Edition, Windows XP Tablet PC Edition, Xbox, MSN, Windows Live. In addition, technologies from the lab have been adopted by international standards bodies such as MPEG4 (error-resilient video transmission), IETF (TCP/IP header compression), and ITU/ISO (video-compression technology).</div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSrvHySt__vn8e4bJMs6o8m9hjgbKhCAR0zreymd446yjeILBajdq3BAnEmL5ePiuHhhW9Cfz03vVmSB_JnR23DaIMm8upcfE4Vy7AfVYEnuUJedJ4hYktrVzvY2V7QmRSxh5tqM2m2H-h/s1600/IMG_1975.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSrvHySt__vn8e4bJMs6o8m9hjgbKhCAR0zreymd446yjeILBajdq3BAnEmL5ePiuHhhW9Cfz03vVmSB_JnR23DaIMm8upcfE4Vy7AfVYEnuUJedJ4hYktrVzvY2V7QmRSxh5tqM2m2H-h/s400/IMG_1975.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5511617714194910242" /></a><br /><div>The prominent labs at Microsoft Research Asia which Mr. Yunbo talked about are:</div><div><ul><li><b>Web Search and Mining</b>: the goal of this group is is to drive the next generation of Web search by leveraging data mining, machine learning, and knowledge discovery techniques for information analysis, organization, retrieval, and visualization. Its core areas focus on structuralizing the Web, vertical search, large-scale experimental web search platform, mobile search and multimedia search.</li><li><b>Information Retrieval and Mining</b>: the goal of this group is to develop advanced technologies to help users accurately, quickly, and easily find information. Currently, the group is working on three projects: algorithms for improving web search, enterprise search, and community search. The following research areas are being intensively investigated: search relevance and learning to rank, link analysis and web graph mining, anti-spam and adversarial information retrieval, document information extraction, and search log data mining. </li><li><b>Natural Language Computing</b>: this group is focusing its efforts on a variety of research topics, including multi-language text analysis, machine translation, cross language information retrieval, and question answering. Over the years, the group has made significant contributions to Microsoft products, including a Japanese and Chinese Input Method Editor (IME), English writing assistant for Office 2007, Chinese couplet game for Windows Live, Chinese word breaker, pinyin search</li><li><b>Web Intelligence</b>: the aim of this group is to enable synergetic collaboration between people and between people and computers to enlighten them and enrich their lives. For this mission researchers of this group develop scalable automatic content analysis methods and quality metrics to analyze a huge amount of online text such as blogs, community-based question answering, forum discussions, news, reviews, Twitter, Wikipedia, etc. and to harvest explicit and implicit knowledge from these media.</li></ul><div>Mr. Yunbo himself is part of Web Intelligence group and prior to this he was a part of the Natural Language Computing group. The areas of focus of his group are expert and social search, user intent/activity recognition and prediction, inarticulate user assistance, information access evaluation, social question answering and summarization and sentiment analysis. His research work heavily centers around community-based question answering services. </div><div><br /></div><div>He told us that the third and fourth floor of Microsoft Reseach Asia is called the Microsoft Search Technology Center and the main focus of all research there is the newly released search engine by Microsoft Bing - this year's SIGIR had 15 papers by Microsoft Research and one can see that researchers at Microsoft are pushing hard day and night to make Bing better and better.</div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOn1QSH_jeZTkmsdtaG5VG9XvAEVB22uSypEDrXiP8j2Dj3kA7DNEhNEnAWuMyIgDX0H6CS6mYe0yAIWrZZbiZyPb7lZHICUH38RYGKdDIjtPye0kY2GtGyObftXrE2R8ajAYhtVHaztwd/s1600/IMG_1974.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjOn1QSH_jeZTkmsdtaG5VG9XvAEVB22uSypEDrXiP8j2Dj3kA7DNEhNEnAWuMyIgDX0H6CS6mYe0yAIWrZZbiZyPb7lZHICUH38RYGKdDIjtPye0kY2GtGyObftXrE2R8ajAYhtVHaztwd/s400/IMG_1974.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5511623518331796962" /></a><br /><div>We could not take photographs of the labs as it was not allowed but it was allowed for some places and the picture below shows the Microsoft Research Asia recreation area where the employees enjoy some time off from work, when we reached it was fruit time. Mr. Yunbo offered us some but we could not take any as we were fasting.</div><div><br /></div><br /><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJcd9NMGZD71fiuFNrwq2rnqlDDPBsBkm7QOF4ajYlCUXqe9yd2BdL_1XfiCO1Lb3TiRfxl7iJM630SpiCrExL3SMfINEoBw2Df97PA___J5Sv0JHUYZj9mctRr7KUVmFAftbVfe_uOFWx/s1600/IMG_1977.jpg"><img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 300px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJcd9NMGZD71fiuFNrwq2rnqlDDPBsBkm7QOF4ajYlCUXqe9yd2BdL_1XfiCO1Lb3TiRfxl7iJM630SpiCrExL3SMfINEoBw2Df97PA___J5Sv0JHUYZj9mctRr7KUVmFAftbVfe_uOFWx/s400/IMG_1977.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5511630044583351122" /></a><br /><div><br /></div><div>The discussion then moved towards our research focus at <a href="http://dblab.kaist.ac.kr/">Database and Mutlimedia lab of KAIST</a>. We told him about a state-of-the-art project by our Professor <a href="http://www.springerlink.com/content/a0u420mt6q43j216/">which relates closely to search engines - the paper got best demonstration award in ICDE 2005</a>. I further explained him how my Professor considers it as an offense to the traditional databases if researchers worldwide believe that MapReduce based systems are the answer for massively scaled information retrieval tasks. The research idea is to incorporate an information retrieval architecture into existing parallel databases to provide best of both worlds: the scalability of Map-Reduce based systems and the additional functionalities of databases (SQL, schemas etc.). These ideas were appreciated by the Microsoft researcher. I told him about my thesis problem statement on improving quality of web search results by combating spam and my wife told her about her work on scalable, massive architectures for parallel web crawling; he was surprised at hearing that we were the only ones in our lab working on these huge project modules and said that at Microsoft they are working in groups for even at module level (for Bing project). We exchanged some information regarding system's nature from being static search to dynamic search.</div><br /></div><div>Then we raised the point for final part of our talk centering around academic collaborations between Pakistani universities and Microsoft Research Asia and future plans of Microsoft Research Asia towards Pakistan. Mr. Yunbo said that Microsoft Research Asia has many, many collaborations with universities through out Asia and that Microsoft Research has two research centers in Asia, i.e., in China and India but China heads Asia in whole as its Microsoft Research Asia. He added that Microsoft Research Asia is always looking for more and more engineers. I asked about the reason behind presence of academic/research collaborations in universities through out Asia but an absence in Pakistan; to this question Mr. Yunbo replied that Microsoft has some collaborations with Pakistani universities but he agreed that those were only technology-oriented collaborations and none exist on applied research and academic level. Additionally he pointed out that Microsoft Research Asia would love to have academic/research collaborations with Pakistani universities and till now there is no serious thought due to lack of an appropriate channel through which to initiate such collaborations. He added that student-researchers like us can serve as the bridge between Pakistani universities and Microsoft Research Asia; and he suggested us to contact Microsoft Research University Relations Team. Mr. Yunbo believes that both Pakistani universities and Microsoft Research can greatly benefit from such academic collaborations and it can lead to a whole new opening for researchers in Asia, as intelligence can never be confined inside some regions only.</div><div><br /></div><div>On our way in a passage, he showed us souvenirs that were given to Microsoft Research Asia by different universities of Asia but Chinese presence was much dominant in those proving how open Microsoft Research Asia is in visiting different universities. </div><div><br /></div><div> Readers are advised to drop in their comments, questions, suggestions as we intend to carry these plans forward for promoting and improving Computer Science research in Pakistan.</div>Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com10tag:blogger.com,1999:blog-943647122747979667.post-25328198356240901822010-07-27T05:23:00.001-07:002015-06-05T14:27:11.659-07:00[Paper]: Revisiting Crawlers’ Role in a Search Engine<div dir="ltr" style="text-align: left;" trbidi="on">
Following are the slides of my paper presentation at ICISA, 2010 in Seoul, Korea.<br />
<br />
<center>
<iframe src="//www.slideshare.net/slideshow/embed_code/key/WF8dU8zXxSaOI" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe> <div style="margin-bottom:5px"> <strong> <a href="//www.slideshare.net/MAtifQureshi/analyzing-web-crawler-as-feed-forward-engine-for-efficient-solution-to-search-problem-in-the-minimum-amount-of-time-through-a-distributed-framework" title="Analyzing Web Crawler as Feed Forward Engine for Efficient Solution to Search Problem in the Minimum Amount of Time through a Distributed Framework" target="_blank">Analyzing Web Crawler as Feed Forward Engine for Efficient Solution to Search Problem in the Minimum Amount of Time through a Distributed Framework</a> </strong> from <strong><a href="//www.slideshare.net/MAtifQureshi" target="_blank">M Atif Qureshi</a></strong> </div>
</center>
<br />
<br />
This paper considers tradeoffs in web crawler design especially from the perspective of events versus threads[1,2]. The paper also makes some recommendations for better OS support for web crawling. It points out that the two principal problems with web crawling are:<br />
<ul>
<li>Choosing the right pages to crawl<br /></li>
<li>Basic architecture for performing the crawl</li>
</ul>
The focus of the work lies on the second problem with our proposition that events are the ideal way for implementation of web crawlers as events give better throughput while crawling the web. Furthermore we argue that the growing usage of search engines needs a careful redesign of the constituents of the search engine and that too from the perspective of systems software with the conclusion that the exokernel[3] is the right answer in removing some of the limitations of search engines today. We recommend having a future operating system dedicated to search engines.<br />
<div>
<br /></div>
<div>
If any of you is interested in more details I recommend him to contact me through email at atifms@kaist.ac.kr or matifq@yahoo.com. Moreover you can also request for a copy of the paper by personal email.</div>
<div>
<br /></div>
<div>
<b>References</b></div>
<div>
<i>[1] von Behren, R., Condit, J., and Brewer, E. Why Events are a Bad Idea (for High-concurrency Servers). In 10th Workshop on Hot Topics for Operating Systems (HotOS IX), Lihue, Hawaii, May 2003.</i></div>
<div>
<i>[2] Ousterhout, J. Why threads are a bad idea (for most purposes). In Invited talk presented at 1996 USENIX Annual Technical Conference, San Diego, CA, October 1996.</i></div>
<div>
<i>[3] Engler, D. R., Kaashoek, M. F., and O'Toole, J. 1995. Exokernel: an operating system architecture for application-level resource management. In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (Copper Mountain, Colorado, United States, December 03 - 06, 1995). M. B. Jones, Ed. SOSP '95. ACM, New York, NY, 251-266.</i></div>
</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0tag:blogger.com,1999:blog-943647122747979667.post-41814086072386832932010-06-02T04:02:00.000-07:002010-09-26T11:10:42.254-07:00Blog Research: 19 GB data processing with 2 GB active RAM<div style="text-align: justify;"><br /><span class="Apple-style-span" style="font-family:'trebuchet ms';">Few days I was performing blogosphere analysis using my crawler "VisionerBot" that I recently presented at ICISA, Seoul, 2010. I had quite a tough time and not to forget a number of sleepless nights while I was on this task as the amount of data was extremely huge :), my task was to process blogs for interest of users over the blogspot domain. In this post I present the problem I was facing and the technique I used to overcome it:</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';"><br /></span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Input: 69 Blogs</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Objective:</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">1) Find 5,000 Blogs</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">2) Process at most 2,000 Blog posts per blog</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Achievement</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Total Blogs found: 5,067</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Total Alive Blogs: 4,552</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Total Number of Posts: 1,704,587</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Problems:</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Size of data that needs to be processed: 19 GB+</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Size of available active RAM: 2 GB</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';"><br /></span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Now what to do from here... When I started working on this, I never expected to find 1,704,587 posts with data size of 19 GB, I worked almost days and nights to get this data fit inside my desktop machine. If I used database for this experiment then it will cost me months to download this where as I had a deadline to complete this task within 10 days. There I gave birth to a new algorithm that I call as "Rack Algorithm" which downloads data in RAM until RAM gets filled and then it flushes the data on disk and cleans up RAM for remaining download process and this exercise continues until data is downloaded completely. After download comes the process of finding meaningful data out of that 19 GB and calling it to RAM to start processing and there I mananged to shrink its size size enough to manage it inside RAM. In this process I used (Key,Value) pairs along with lists.</span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';"><br /></span></div><div style="text-align: justify;"><span class="Apple-style-span" style="font-family:'trebuchet ms';">Finally a sigh of relief, I have accomplished what seemed nearly impossilbe within 10 days, I am happy I managed to find opinion clusters of 1,704,586 posts by my coming algorithm that I call TDR (Topic Discussion Rank).</span></div>Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com5tag:blogger.com,1999:blog-943647122747979667.post-56505472680240732142010-05-14T02:16:00.000-07:002014-06-12T12:03:57.534-07:00Welcome Post<div dir="ltr" style="text-align: left;" trbidi="on">
So what should I write here? I think its time to write something random; something that gives me freedom to write whatever I do in small world of daily science.<br />
<div>
<br /></div>
<div>
Consider if computers were to be human, then I will have three more close friends in my life. Desktop machine, laptop and notebook. Why??? the reason is I spend most of my time with computer :)</div>
<div>
<br /></div>
<div>
I plan to share what ever I think , no matter how scattered or how confined.</div>
<div>
<br /></div>
<div>
I also intend to write things related to Linux; yeah! I am PC powered happily by Linux. So those of who are looking to try Linux feel free to come in contact with me. My prescription is Windows is a cancer for researchers and the cure is Linux. Even if you don't know how to operate Linux, then don't be afraid Linux is now more friendly than ever before, try Ubuntu.</div>
<div>
<br /></div>
</div>
Anonymoushttp://www.blogger.com/profile/04121454908316607335noreply@blogger.com0