INDEX
Explanations
references to legal or political actions involving specific entities
the word "the" in various contexts
New Auto-Interp
Negative Logits
heit
-0.72
TED
-0.71
onday
-0.68
ben
-0.67
aces
-0.66
Interstitial
-0.65
tle
-0.64
preceded
-0.64
AMS
-0.64
worn
-0.64
POSITIVE LOGITS
latter
1.12
strongest
1.06
slightest
1.01
emergence
1.01
hars
0.95
widest
0.95
same
0.93
greatest
0.92
largest
0.92
entire
0.91
Activations Density 0.229%