INDEX
Explanations
mentions of the city Oxford
New Auto-Interp
Negative Logits
venge
-0.81
lyak
-0.80
quo
-0.77
rill
-0.76
selage
-0.75
ACTED
-0.69
SHIP
-0.69
++++++++++++++++
-0.68
ACTION
-0.67
++++++++
-0.66
POSITIVE LOGITS
shire
1.55
University
1.02
Circus
0.99
comma
0.93
Oxford
0.88
Analy
0.84
Curve
0.81
Dictionary
0.81
Shakespeare
0.80
Laure
0.79
Activations Density 0.019%