INDEX
Explanations
mentions of specific names or titles
New Auto-Interp
Negative Logits
Jonah
-0.71
ype
-0.64
cock
-0.63
raltar
-0.63
Goat
-0.63
Plains
-0.63
dick
-0.62
Jericho
-0.62
ESE
-0.62
Canaan
-0.62
POSITIVE LOGITS
herself
1.32
pher
0.84
athed
0.83
Anne
0.77
lled
0.77
lashes
0.76
athing
0.75
vagina
0.75
husband
0.75
Marie
0.75
Activations Density 0.068%