Skip to content

The goal of this project is to restore my disabled Uber account by crawling all Uber employees from LinkedIn and emailing them asking for help since I couldn't resolve this problem through the regular channels.

Notifications You must be signed in to change notification settings

rafaelcgama/restore-uber

Repository files navigation

Operation Restore Uber Account

Disclaimer: this crawler was operational as of 05/18/2020

Objective

The goal of this project is to create a crawler to scrap LinkedIn for all Uber employees in San Francisco and São Paulo and mass email them a letter hoping that someone, somewhere would be customer oriented enough to reactivate my account.

Background

The idea for this project emerged when my Uber account was disabled, and after a long time trying to restore it, I became very frustrated and finally gave up trying fix it via customer service. If you are interested in knowing the whole story you should read the email I sent them. Long story short, my account started to get blocked for no reason. Every time I would reach their customer service and they would always say that it was just a glitch and would promptly reactivate it. This went on and on until one day, without any explanation, they permanently disabled my account and stop responding me.

At this point, the smart move would have been to just change my phone number and get a new account. But after going through all the headache and having spent a lot of time emailing them back and forth, I refused to bow, and instead, had the "brilliant idea" to spend even more of my time on this issue and came up with this project.

Dataset

Uber doesn't really have a catalog or an API I could just pull their employees emails from so I had to get the data myself. I noticed by exchanging emails with Uber that their emails had two different formats, as shown in the example below:

Rafael Gama

rafael@uber.com
rafaelg@uber.com

Now I that knew Uber's email formats, all I needed was to get a list of employees first and last names and construct the emails. In order to do that, I developed a crawler that goes to LinkedIn's people search section, selects a city and a company, and scrape all results.

Crawler Development

LinkedIn proved to be a worthy adversary and a particularly hard site to crawl. After analyzing the task, I came up with the following set challenges and solutions:

  • LinkedIn requires JavaScript to render content on the page:

    I am familiar with both selenium and requests but because of this peculiarity, I couldn't crawl the website using requests so I had to settle for selenium

  • LinkedIn server reject a large series of requests from the same IP address in a given time period and suspends your ability to search for a few days. Also, the page rendering time would greatly vary between requests:

    To avoid using long sleep times, I tried to use Selenium's Wait module to dynamically wait for the page to load but had little success. The code was unstable and breaking all the time so I was forced to increase the sleep times to solve both problems.

  • LinkedIn is on the lookout for bot behavior patterns:

    To reduce my chances of being flagged, I created and applied a set of functions that try to simulate human behavior by scrolling up and down and randomly clicking on profiles.

Data Preparation

Number of results in each city:

However, not all results were relevant for my purposes as many of my results no longer work at Uber among other things. So I went on to the process of cleaning and exploring my dataset to come up with a more relevant employee list that would increase my chances of being helped while saving resources. Because the dataset is not complex, there wasn't much to do so I wrote a little script to execute the following actions:

  • Search and remove duplicates
  • Remove all employees not currently employed at Uber
  • Remove all employees showing "LinkedIn Member" as a name
  • Remove all employees working in the capacity of a driver/motorista (Portuguese)
  • Remove all employees working for Uber Eats and Uber Freight, UberAir, UberAIR / UberElevate and Uber Works

After cleaning the data, I was left with 764 employees (779 using pandas) from San Francisco and 585 employees for São Paulo (692 using pandas). The discrepancy between the results using my script and pandas got me a little intrigued but I decided to investigate that at a later time. Then, my next step was to develop an email constructor that returns 2 emails per name, one in each format.

Execution

Finally, all that was left to do was to integrate the email constructor in the email sender script and start bombarding Uber with emails, right?

Awwww, if it was only that easy...There is still one challenge left to overcome: Servers restrictions at both ends.

  • Both servers can flag my emails as spam
  • LinkedIn's server may perceive my emails as a security threat
  • My provider (Gmail) has email sending limits of their own

Because the content of my emails is exactly the same aside the person being addressed, they are in considerable risk of being flagged as spam. At the same time, if the emailing frequency is too high, it may trigger a security alert. The million dollar question is, how much is too much? Because I am not a expert in servers, I decided to be cautious and randomly space out the time between emails and send them in batches of 20 emails every 10 minutes, until I hit 360 sent emails, which was target I stipulated per day.

Results

The outcome of the project couldn't have been better. I ended up getting my account restored in the very first round of emails!!!

In the end, 168 emails were sent but only 31 reached its destination. I was expecting the conversion rate to be twice of that because I was only using 2 email formats per employee. After further investigation, I discovered that Uber uses a few other email formats so that explains why most of them were not delivered.

So that's it! Mission accomplished!!! My account is back in business so let's see if will remais this way now...

Potential Improvements

Since this is a pet project, the first and foremost goal was to have it up a running as fast as possible so I could actually try to fix my problem and restore my Uber account. However, while I and I was working on the project, I had many ideas on what can be done to improve the crawler's performance, scalability and robustness. These are a few of those ideas:

  • Find a way to reduce the number and duration of sleep times
  • Parallelize code in order to run multiple collections simultaneously
  • Rotate between different IPs to reduce the chance of getting flagged
  • Find out the reason for the discrepancy between the results generated by using my data cleaning function and pandas.
  • Refactor code to use a workflow management system, to become more robust to failures and be able to resume pipelines from failing points

About

The goal of this project is to restore my disabled Uber account by crawling all Uber employees from LinkedIn and emailing them asking for help since I couldn't resolve this problem through the regular channels.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published