An Application Of Information Retrieval Theory

1 Web Search An Application of Information Retrieval Theory Term Project, Summer I 2015 Note: This project assignment is adopted from Dr. Xiannong Mengs courseware at The goal of the project is to produce a limited scale, but functional search engine. The search engine should be able to provide a list of relevant documents when a query is given, just like any commercial search engine would do. It is in a limited scale that it is required to collect a limited number of documents (e.g. in the order of a few hundreds to a few thousands). The more your search engine can collect, the better it is. This is a multi-phase, team project. It will start from the beginning of the semester and last through the semester. The detailed scope of the project, the team organization, technical information, and other details will be given as the semester progresses. An overview and the first part of the project will be given here. Components of a Search Engine A search engine consists of a collection of software components that work together to accomplish the task of collecting, analyzing a large number of documents over the Internet and giving the user a list of relevant documents and URLs when a query is issued to the search engine. Major components of a typical search engine include the following: a user interface which takes the user input, passes the query to the , displaysthe resultssent back by the engine; a crawler which visits the Web and collects information about all the documents it encountered from the Web; an indexer which indexes each of the pages collected from the Web by the crawler and establishes links between keywords and documents that contain these keywords; a ranker/retriever which ranksthe documents for a to certain measures and retrieves the top relevant documents for the user 2 a back-end engine which takes care of network and file operations. Figure 1 indicates the relation among different components in a typical search engine Figure 1: Components of a Typical Search Engine A search engine consists of two major parts, somewhat independent of each other, as can be seen from the figure. One is on the left side of the document collection, which answers users queries. The other is on the right side of the document collection, which collects information from the Web so the URLs related to the user queries can be retrieved. A crawler goes around the Web to read Web pages and to extract information about each Web page it reads. The information is then sent to an indexer. The indexer takes this information and creates the links between keywords and the documents that contain these keywords. The result is typically saved into a file or a collection of files. When a user issues a query the document list is searched and a collection of relevant documents is generated. The ranker is responsible to rank these documents according to certain algorithms and measures. The top ranked documents are returned to the user for review. It is possible for the user at this point to review the documents and send feedbacks to the search engine. The ranker may take feedback into account and re-select or re-rank the documents for the user to view. Project Team The project will be carried out in teams of at most 4 (four) members. System Requirement Use your own personal computer for your development. You need to have a local web server that allows user to access it through a web browser as an interface. Your Work in Phase One and Some Technical Details Your phase one work is to implement a basic version of the interface and the back-end engine. This 3 is just a framework. As the project progresses, some of these components will need to be enhanced. The interface part is responsible for the following main tasks. Display a main search engine page (you should design this page carefully, because it will be the gateway of your search engine to the world). Read the user inputs from the browser. Say, the user enters Hi, this is Bob. Display the results generated by the search engine. For the above same input, the search engine shall return something, like, Good to know you, Bob. Implement a crawler to retrieve 100 textual documents from www.cs.panam.edu or from any other web site which permits your crawler to retrieve. Save these documents in a directory at your local server. Remember, these 100 documents will be the corpus for your search engine. What to Submit Your team needs to hand in the following the documents to BlackBoard. One submission is allowed per team. 1. A team report for phase one of the project with a cover sheet. The report shouldnt be too long, maybe two to three pages. The report should include the following as a minimum. (a) The name of your team (should be same as your search engine); (b) A description of the roles of each team member and the contributions of each member; (c) A summary of the working process, e.g. what the team started with, what the team has accomplished, any problems encountered, how the team solved them, any thoughts on the project, among others. 2. Source code for the programs and HTML pages. 3. Snapshots of sample runs from a Web browser.

Place your order
(550 words)

Approximate price: $22

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
The price is based on these factors:
Academic level
Number of pages
Urgency
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
1
Need assignment help? You can contact our live agent via WhatsApp using +1 718 717 2861

Feel free to ask questions, clarifications, or discounts available when placing an order.
  +1 718 717 2861           + 44 161 818 7126           [email protected]
  +1 718 717 2861         [email protected]