INDEX
Explanations
words related to specific individuals, like names
the letter 'k' in various contexts
New Auto-Interp
Negative Logits
mosqu
-0.91
emergencies
-0.72
behavi
-0.68
constitu
-0.67
ãĥ¯
-0.67
contraceptives
-0.67
liberties
-0.66
wcsstore
-0.66
Malf
-0.65
decay
-0.63
POSITIVE LOGITS
ansas
1.29
irk
1.15
idding
1.15
rieg
1.13
orea
1.10
won
1.01
itty
1.00
icker
0.98
ota
0.95
bps
0.92
Activations Density 0.038%