Like me

AsianAve.com wants to provide a personal message similarity feature to its users. Specifically, a user should be able to search for other users who have matching phrases in their personal messages. A phrase is a set of contiguous words ignoring sentence boundaries and punctuation. Two phrases match when they contain the same sequence of words, ignoring case and punctuation. Consider the following 2 personal messages:

  • I am into long walks on the beach and sipping wine by moonlight. Send me a note if this sounds like your idea of a good time.
  • Hi friends! I'm here to meet new people. Some of my hobbies include sipping wine by moonlight and watching cheesy movies. Sign my guestbook please!

These 2 personal messages have "sipping wine by moonlight" in common.

An additional requirement is that the matching algorithm should be able to ignore certain stop words so that results are more useful. For example, if the word "by" were considered a stop word, then the longest matching phrase in the 2 personal messages would be "sipping wine moonlight".

Write a program, called "like_me", that takes an example user and finds other users with similar personal messages. The program should accept 3 arguments: the name of an input file that contains a list of users and their personal messages, the name of an input file that contains a list of stop words that should be ignored when considering matches, and the name of the example user. The program should compare other users' personal messages against the example user's personal message and output the longest matching phrase in both personal messages. The longest matching phrase is the phrase with the most contiguous words (ignoring stop words). Your submission and code solution must follow the input and output specifications prescribed below. Your program will be tested against several different input files, each with different data.

Input specifications

The personal messages input file provides a series of users and their personal messages. Each line consists of a username, a colon, and then their personal message:

<username>: <personal message>

Here's an example input file:

Barbie: I like long walks and sipping wine by moonlight. Ken: Some of my hobbies: sipping wine by moonlight and ... Jennifer: I like to go to movies a lot. Fred: I like sipping wine by moonlight!

The stop words input file provides a list of words that should be ignored when considering matching phrases in personal messages. Each line in the file represents a stop word:

and by the

Output specifications

Given an example user, your program must output a list of the other users, ordered lexicographically, and the longest matching phrase in their personal message.

Example output (considering Barbie as the target user and the "and", "by", and "the" as stop words):

Fred sipping wine moonlight Jennifer I like Ken sipping wine moonlight