INDEX
Explanations
phrases related to stating opinions or positions
variations of the letter 's' in words
New Auto-Interp
Negative Logits
fret
-0.67
horses
-0.65
caps
-0.63
horse
-0.63
Spart
-0.60
Shiva
-0.60
Alexandria
-0.60
Strip
-0.60
Pharaoh
-0.59
souls
-0.59
POSITIVE LOGITS
pecially
1.54
atisf
1.52
aturated
1.46
lightly
1.39
olved
1.37
olutions
1.34
udden
1.33
ixty
1.32
uddenly
1.32
ensitivity
1.32
Activations Density 0.053%