Is the app any good?

It's nice knowing who our users are, but I'd really like to know if the app is successful at matching roommates together. Unfortunately, short of stalking every one of the 30,000 somthing users of the app, we don't know if people who meet through the app ever later on get an apartment together.

Rather than answer the difficult question "how many successful roommate relationships came out of this app" I decided to focus on the more narrow question "how many conversations occured on this app?"

Luckily I had access to the message log!



Some Background Stats

I looked at just over 11,000 messages sent over a 9 month range, of which only about 6% received a response (!). This made me curious about why the response rate was so low. If I were a user of this app, I wouldn't be happy if just 1 out of every 20 messages I sent ever got an answer.



The Data

I needed to rearrange the data to ask my question in a machine readable way. There were three types of features I wanted to consider, show with some example features:

  1. Features from the message:
    • How many words long is it?
    • When was it sent?
  2. Features from the users:
    • How old are they?
    • When do they need a room?
  3. Features from combinations of the message and users:
    • How similar are the users in age?
    • How long between when the message was sent will the recipient need a room?


Features from the message

The first set of features were easiest. I had a table that looked roughly like this, with the message id and important information about the message.

Message ID Send ID Receiv ID Date Sent Time Sent Conv Len
31415 987 654 2017-5-19 18:34 5



Features from the users

Information about the users is stored in a separate table which looks something like this:

User ID Age Gender Needs Room By
987 23 Female 2017-06-01
654 24 Female 2017-06-05

I first combined the conversation and the user table. Message 31415 was sent from user 987 to user 654 so information about those two users is added to the conversation table. (New columns are indicated by yellow shading.)

Message ID Send ID Reciv ID Date Sent Time Sent Conv Len S. Age R. Age S. Gender R. Gender S. Move-in R. Move-in
31415 987 654 2017-5-19 18:34 5 23 24 Female Female 2017-06-01 2017-06-05



Features from combinations of the message and users

Finally, once I had information such as the age of the two users, I could calcualte interaction features such as the difference in age between them. (New columns are once again indicated by yellow shading.)

Message ID Send ID Receiv ID Date Sent Time Sent Conv Len S. Age R. Age S. Gender R. Gender S. Needs Room R. Needs Room Age Diff Same Gender? S. Urgency R. Urgency Diff in Urgency
31415 987 654 2017-5-19 18:34 5 23 24 Female Female 2017-06-01 2017-06-05 1 year Yes 12 days 17 days 6 days


The Targets

My goal is to use the information I have to predict whether this conversation lasted more than a single message. In this case, we know that the conversation length is five messages. I would call that a win for the app!

To create X, my feature table, I removed the "Conversation Length" variable, because obviously that would be cheating. I also removed some not-very-useful features like the ids.

To create y, my target variable, I labeled any conversation that was at least two messages long as a 1, and shorter conversations were 0s.