INDEX
Explanations
references to a specific sports team
references to the term "Kings" in various contexts
New Auto-Interp
Negative Logits
awaru
-0.83
eredith
-0.75
zzle
-0.70
enegger
-0.69
ierrez
-0.66
ATIONAL
-0.65
dissemination
-0.64
GoldMagikarp
-0.64
mble
-0.64
ramid
-0.63
POSITIVE LOGITS
guard
1.05
Kings
1.05
Kings
0.98
hip
0.94
north
0.90
bury
0.87
lad
0.86
knife
0.85
pan
0.84
olver
0.83
Activations Density 0.010%