GitHub - michael-grotelueschen/amicus

#Amicus

Overview

The Supreme Court consists of nine justices who decide cases brought before them by petitioners. The petitioners' opponents in a case are called respondents. The court's decisions are based in part on oral arguments, during which the attorneys for both the petitioners and respondents answer questions from the justices.

The purpose of Amicus is to predict Supreme Court decisions based on the behavior of attorneys and justices during oral arguments.

Result

Amicus achieves accuracy comparable to a control model based on legal information developed by expert legal analysts (Katz, et al., 2014) even without features based on legal expertise. Instead, the features are law-agnostic textual details like laughter, pauses, or interruptions found in oral argument transcripts.

Pipeline

Amicus has 4 parts:

Obtain oral argument transcripts
Process them
Extract behavioral features from the text
Model

Supreme Court oral argument transcripts are public and freely available at supremecourt.gov. The site was crawled and more than 700 transcripts were downloaded in PDF format. Then, Apache Tika was used to extract raw text data from the PDFs. A few python scripts were used to clean the raw text and process it into a format to make analysis as easy as possible. There are many examples of the format in txts_whitelist. Next, behavioral features like laughter, pauses, interruptions, or mentions of other cases were collected from the text. These and other features were collected separately for justices, petitioner attorneys, and respondent attorneys. Finally, these features were used in a logistic regression model. More detail can be found in code.

Future Steps

Make data pipeline more robust
Explore interaction terms
Explore more data sources

This project has lots of potential. One idea is to make the data pipeline more robust, although this would be difficult due to the variability in the transcript PDFs and the text output from Apache Tika. Another is to explore interaction terms and different models. The most interesting improvement would be to explore and incorporate more data sources. Given that attempts to predict the Supreme Court so far achieve similar results irrespective of data source, whether it is my project that uses around 700 oral argument transcripts or the control model Katz, et al., 2014 that uses 60 years of complex case law and justice data, my intuition is that the best dataset for this problem has not been found.

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
code		code
scdb		scdb
txts_whitelist		txts_whitelist
Michael_Grotelueschen.key		Michael_Grotelueschen.key
README.md		README.md
court.jpg		court.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

scdb

scdb

txts_whitelist

txts_whitelist

Michael_Grotelueschen.key

Michael_Grotelueschen.key

README.md

README.md

court.jpg

court.jpg

Repository files navigation

Overview

Result

Pipeline

Future Steps

About

Releases

Packages

Languages

michael-grotelueschen/amicus

Folders and files

Latest commit

History

Repository files navigation

Overview

Result

Pipeline

Future Steps

About

Resources

Stars

Watchers

Forks

Languages