A Financial Fraud Detection Case based on Enron

The work of few bad men or the dark shadow of American dream?[1,2]

Upcoming

Submission of group list:				19 Feb 2021
Submission of case study briefing:		26 Feb 2021
Project Presentation:				29-30 Apr 2021
Project Report submission:			8 May 2021

Team Members: Wang Zhiyi, Zhang Siyu, Li Junjie
Ref:
- [1] https://en.wikipedia.org/wiki/Enron:_The_Smartest_Guys_in_the_Room
- [2] https://www.bilibili.com/video/BV124411V7Zx?p=2&spm_id_from=pageDriver
Data Source:
- Kaggle Enron Dataset

Specification of data

This is a specification of data. This specification is compiled by the steps and topics.

Data Preprocess

split_1: store the converted raw data by sengments, each segment composed by 10,000 emails, the last one is composed by 9,999.
email_split: store the regular data extracted from split_1

Word Cloud

word_cloud: stores word cloud pictures in .pngs, and .txt stores the top 200 frequent words.
word_hash:
word_list:

Community Detection

`email_communities`

unweighted
- gexf_files: graph stored in .gexf format.
- gml_files: graph stored in .mgl format.
- visualization: community detection pictures.
weighted: not in use yet

Email Content Emotion Analysis

email_book.xlsx: store the persons' email address and name.
email_corpus_by_selected_person: sotre the above persons' email address, email content, name, time stamp, status, etc.
email_corpus: store the corpus of email content in .txt format.
- each file stores 10,000 emails, the last one stores 9,999 emails.
email_corpus_by_person: store the corpus by person, each file name represents an email send address.
email_graph
- unweighted: stores the "From" addresses and "To" addresses of each email.
- unweighted_clean_data: stores the regular "From" addresses and "To" addresses of each email, recommanded to use.
- Note: "Cc" addresses are treated as part of "To" addresses.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
externalResources		externalResources
programm		programm
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

externalResources

externalResources

programm

programm

LICENSE

LICENSE

README.md

README.md

Repository files navigation

A Financial Fraud Detection Case based on Enron

Upcoming

Specification of data

Data Preprocess

Word Cloud

Community Detection

`email_communities`

Email Content Emotion Analysis

About

Releases

Packages

Contributors 3

Languages

License

Eliseowzy/financialFraudDetection

Folders and files

Latest commit

History

Repository files navigation

A Financial Fraud Detection Case based on Enron

Upcoming

Specification of data

Data Preprocess

Word Cloud

Community Detection

email_communities

Email Content Emotion Analysis

About

Resources

License

Stars

Watchers

Forks

Languages

`email_communities`