INDEX
Explanations
phrases indicating extremity or intensity of action or opinion
phrases indicating actions or opinions that go to extremes or limits
New Auto-Interp
Negative Logits
itu
-0.77
rio
-0.72
odes
-0.70
Puppet
-0.69
cyclopedia
-0.68
tu
-0.66
otten
-0.66
Pend
-0.64
ĭ
-0.63
odd
-0.63
POSITIVE LOGITS
lengths
0.77
differently
0.76
stride
0.71
unnoticed
0.68
embro
0.66
persu
0.65
step
0.64
overr
0.62
HER
0.61
rug
0.60
Activations Density 0.039%