INDEX
Explanations
references to specific universities, particularly Oxford University
mentions of prestigious educational institutions, particularly Oxford and Princeton
New Auto-Interp
Negative Logits
quo
-0.77
++++++++++++++++
-0.76
Magikarp
-0.72
++++++++
-0.70
selage
-0.69
++++
-0.66
////////
-0.65
venge
-0.65
RANT
-0.65
VICE
-0.64
POSITIVE LOGITS
shire
1.61
Circus
0.99
bridge
0.92
University
0.89
hurst
0.89
comma
0.86
Oxford
0.84
Shakespeare
0.81
Square
0.80
ington
0.79
Activations Density 0.025%