Question 1: (6- 4+2)
Try and find a Query of the form [Query-term-1, Query-term-2] (without quotes) that,
on Google, produces at least one result that contains only one of three terms. That is, try
an example where Google does not interpret a the-term query as a conjunction. (If you
difficulty with finding an appropriate query, try one that produces very few hits, say,
|(i)||Take screenshot of the first page of Google results (or more if you want to)
each result with 2 (both terms occur on the page), 1 (one term occurs on the
0 (neither term occurs on the page)
Based on this evidence, does Google interpret all queries as a Boolean
Question 2: (16: 8+4+4)
Recall and Precision are two important evaluation metrics that we use to analyze a set
unranked results. Precision and Recall metrics consider the differences between set of
documents retrieved for given query and the set of documents that are relevant to the
A) Compute Recall, Precision and [email protected] for the following retrieval against
Q1, Q2 and Q3
|Relevant document||Retrieved Document|
|Q1||1,14,17,23, 24, 33,54, 55, 59,
|2,5,7,23, 33,50, 55, 59, 77,98, 99,
101, 103, 110,120
|Q2||14,19, 25, 27,30,39, 42, 63, 769,
|14, 21, 25, 26,27, 38, 42, 63, 569,
769, 790, 1565, 1589
|Q3||8, 11,32,54,67,69,78, 79,
|11, 13, 17, 19, 21, 32,77,79,
|Q4||4,26, 38, 63, 569, 769, 790,
|14, 21, 25, 26,27, 38,63, 88, 769,
B) Recall and Precision are often discussed together as their focus is on complementary
information. If precision is important, the we don’t not want to see any non-relevant
documents. That is, whatever is retrieved, should be relevant. If recall is important, we
want to see all the relevant documents, even if it requires sifting through some
nonrelevant ones. Provide and Justify two information-seeking tasks where precision
considerably more important than recall. Similarly, Provide and Justify two
informationseeking tasks where recall may be more important than precision. [Don’t
forget to justify
your choices: Justification will be graded, not the particular choices].
C) The trade-off between Recall and Precision may be user-specific i.e. some users may
interested in precision than recall and vice versa. How the search engine try to guess
without asking, whether user cares more about precision than recall, or vice versa?
Think of different ways, users interact with a search engine and be creative!
Question 3: (6: 3×3)
(a) Consider, we have three collections C1, C2, and C3 that have 500, 15,000 and
documents respectively. We have added All documents in C1, to C2 and C3. Which
collection is likely to have more new terms added to its vocabulary (C1, C2 or C3) and
why? [Heaps’ Law]
(b) Calculate the tf-idf for below documents.
a. D1: Sweets Potatoes are Sweet
b. D2: Sweet Oranges are sour and Sweet
c. D3: I have sweet Apple, Sweet Orange, Sweet Potatoes
Question 4: (10-5×2)
(houses OR for OR sale OR in OR Geelong OR Melbourne)
(houses AND for AND sale AND in AND Geelong OR Melbourne)
Suppose these are issued to a search engine that uses the ranked Boolean retrieval
Assume, for simplicity, only four documents in the collection (with document ids 1-4).
Answer the following questions. The above table gives the number of times each queryterm
occurs in each document.
|(i)||Compute the document scores and the ranking associated with the query
for OR sale OR in OR Geelong OR Melbourne).
How is the ranking produced probably sub-optimal and why does this
(iii) Compute the document scores and the ranking associated with the query (houses
for AND sale AND in AND Geelong OR Melbourne).
(iv) How is the ranking produced probably sub-optimal and why does this happen?
(v) How would you extend the Boolean retrieval model to handle AND NOT constraints
(e.g., houses AND NOT Geelong)? Your proposed solution should give a higher score
to documents that contain fewer occurrences of the term to the right of the AND NOT
(e.g., Geelong). Please be as mathematical as possible. In other words, saying: “I would
reduce the score for documents that contain the word to the right of AND NOT.” is too
(vi) Using the index, what would be the Boolean retrieval model scores given to
1-4 by your proposed scoring method for the query “houses AND NOT Geelong”?
Question 5: (12-4×3)
Doc1: A book is considered a good book that makes the reader feels better.
Doc2: I love reading good books to feel better.
Doc3: One can feel better after reading Tom’s recent book.
Query-1: I love books that are good
Query -2: reading good books make you feel better
Stop Word Dictionary=[is, can, after, a, to, I, the, about, that]
|Explain the similarity scores of both Query -1 and Query -2 using TF-IDF.
How would the result change if TF-IDF is used instead of TF as Query?
What do prefer using TF or TF-IDF as Query (Support your claim using F
WE’VE HAD A GOOD SUCCESS RATE ON THIS ASSIGNMENT. PLACE THIS ORDER OR A SIMILAR ORDER AND GET AN AMAZING DISCOUNT
The post interpret a the-term query as a conjunction appeared first on essayfurious.com.