INDEX
Explanations
prepositional phrases describing a negative judgment or critique
phrases indicating consequences or implications
New Auto-Interp
Negative Logits
ARM
-0.66
949
-0.65
Interested
-0.65
LOCK
-0.64
perse
-0.63
NV
-0.63
Dist
-0.63
amus
-0.60
RL
-0.60
rament
-0.60
POSITIVE LOGITS
behalf
1.60
occasion
1.32
erous
1.17
steroids
1.11
etime
1.08
occasions
1.03
eday
1.02
eness
1.01
shore
0.96
slaught
0.96
Activations Density 0.299%