INDEX
Explanations
occurrences of the word "man" in various contexts
New Auto-Interp
Negative Logits
erties
-0.17
afen
-0.15
maal
-0.15
ted
-0.15
erte
-0.15
iran
-0.15
人çī©
-0.15
ayne
-0.15
$MESS
-0.14
istes
-0.14
POSITIVE LOGITS
iac
0.31
hattan
0.28
ufac
0.24
hunt
0.23
hood
0.23
agment
0.22
agements
0.22
tras
0.22
opause
0.21
ifold
0.21
Activations Density 0.071%