INDEX
Explanations
proper nouns or specific phrases identifying individuals or locations
phrases indicating someone's ability or potential to perform an action
New Auto-Interp
Negative Logits
hating
-0.68
striving
-0.63
chasing
-0.60
Shaw
-0.60
stalking
-0.59
rejecting
-0.58
rejection
-0.58
Confeder
-0.58
Simone
-0.58
bush
-0.56
POSITIVE LOGITS
't
1.35
adian
1.18
berra
1.17
isters
1.08
afford
1.00
asta
0.96
NOT
0.96
attest
0.93
ister
0.91
idate
0.89
Activations Density 0.244%