INDEX
Explanations
terms related to physical actions or conditions
words related to 'people' or 'individuals'
New Auto-Interp
Negative Logits
é¾
-0.79
sm
-0.63
sidel
-0.62
microscopic
-0.61
SHIP
-0.58
mindfulness
-0.57
numbered
-0.57
metic
-0.57
etheless
-0.57
selves
-0.57
POSITIVE LOGITS
cific
1.23
rer
1.15
anut
1.06
rers
1.04
ller
1.01
oples
1.00
ptin
0.96
cies
0.95
reon
0.95
cial
0.93
Activations Density 0.027%