INDEX
Explanations
occurrences of the word "man" and its variations in context
New Auto-Interp
Negative Logits
tone
-0.21
yaw
-0.18
ATIC
-0.17
y
-0.16
ted
-0.16
tach
-0.16
yen
-0.15
sense
-0.15
tt
-0.15
ounder
-0.15
POSITIVE LOGITS
agements
0.34
tras
0.33
agment
0.31
made
0.31
hattan
0.31
ufac
0.30
handling
0.29
chester
0.29
handled
0.28
hole
0.28
Activations Density 0.018%