Information Retrieval
Multilingual Search System for Social Network
CSE 535 - Fall 2015
The goal of this project is to build a multilingual faceted search system, including a front end that allows users to search and browse multilingual data based on various criteria: topic, location, person, etc.
- A pure multilingual faceted search system
- Can handle queries in 5 different languages- English, Russian, German, French and Arabic
- Based on twitter data corpus with data of around 0.1 million tweets
- Data spans more than 120 countries
This option involves leveraging the faceted search capability provided by Solr to allow various types of drill-down. Facets include people, topics, locations etc.
This option involves computing various analytics that provide insight into the data.
Examples include: volume of tweets by region/topic/hashtag, sentiment analysis, analytics illustrating cultural differences, etc.
In this option, we demonstrates cross-lingual capabilities. This can take on many aspects: one example involves cross-lingual queries, and automatic translation of resulting foreign language snippets.
For example, a search for a particular individual/place/organization should take place simultaneously in multiple languages –achieved by automatically tagging and normalizing entities across languages.
This option involves coming up with a novel ranking algorithm for tweets that balances recency with importance of content when presenting tweets. It could also take into account the popularity of a tweet, or the influence of a person tweeting, the location of the user, their interests etc...
This option involves inferring some graphical structure from the tweets, based on entities mentioned, topics discussed etc. Graph structures (or relationships between tweets) could also be inferred through connection of topics reflected in the tweets
We have taken reference from below sources to design this search system: -
- Introduction to Information Retrieval
- Course by Oresoft LWC
- Apache Solr Tutorials
- Apache Solr Wiki
- Apache Solr Reference Guide
This project uses below open source api's. We are grateful for their contribution: -
- Language Detection Api of detectlanguage.com
- Microsoft Bing Language Translation Api
We also acknowledge and grateful to Professor Rohini K. Srihari and TAs James Clay, Nikhil Londhe, Chuishi Meng and Ruhan Sa for their continuous support throughout the Course (CSE 535) that helped us learn the skills of Information Retrieval and build a Multilingual Search System.
Alexander Simeonov , Akash Desai , Riaz Munshi and Karanjeet Singh
Copyright {2016} {Ramanpreet Singh Khinda rkhinda@buffalo.edu, Alexander Simeonov agsimeon@buffalo.edu, Akash Desai akash101192@gmail.com, Riaz Munshi riazmuns@buffalo.edu and Karanjeet Singh karanjee@buffalo.edu}
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.