INDEX
Explanations
the word "lock" or words related to security or control
New Auto-Interp
Negative Logits
issance
-0.87
LV
-0.78
schild
-0.73
ãĤ¡
-0.71
enegger
-0.70
abama
-0.69
ilater
-0.68
olf
-0.66
xual
-0.66
resso
-0.66
POSITIVE LOGITS
picking
1.25
heed
1.17
pick
1.09
creen
1.06
door
1.02
step
0.96
lear
0.94
hold
0.92
downs
0.91
horns
0.87
Activations Density 0.028%