INDEX
Explanations
the word "mans" at varying levels of activation
mentions of the word "mans."
New Auto-Interp
Negative Logits
Jade
-0.72
Cortana
-0.67
Tune
-0.62
fusion
-0.61
Task
-0.60
Holo
-0.60
Hydra
-0.59
stream
-0.57
embedded
-0.57
num
-0.57
POSITIVE LOGITS
mans
4.62
mens
1.58
mans
1.50
men
1.35
man
1.33
mann
1.28
woman
1.13
kens
1.13
MAN
1.12
tons
1.09
Activations Density 0.007%