INDEX
Explanations
names or mentions of people and organizations, particularly related to politics and historical events
repeated references to the abbreviation "hr"
New Auto-Interp
Negative Logits
eers
-0.72
Totem
-0.69
BALL
-0.69
cens
-0.67
Point
-0.62
eties
-0.62
seeded
-0.61
Hots
-0.60
Austral
-0.60
Clash
-0.59
POSITIVE LOGITS
ud
0.98
anging
0.98
uty
0.97
acht
0.94
iller
0.93
anged
0.93
hr
0.90
rr
0.87
yth
0.87
acket
0.86
Activations Density 0.005%