INDEX
Explanations
mentions of the word "man" in various contexts
New Auto-Interp
Negative Logits
erties
-0.18
engin
-0.18
genic
-0.17
gaard
-0.17
gui
-0.17
leanor
-0.17
ën
-0.16
rganization
-0.16
ALLY
-0.16
ted
-0.16
POSITIVE LOGITS
iac
0.40
hattan
0.34
agements
0.30
ufac
0.29
agment
0.28
who
0.27
hunt
0.27
ifold
0.27
tras
0.26
-child
0.26
Activations Density 0.089%