INDEX
Explanations
phrases related to strong opinions or criticisms
New Auto-Interp
Negative Logits
ounces
-0.76
thood
-0.73
Ò
-0.72
contained
-0.72
perse
-0.70
âĢº
-0.67
overseen
-0.67
imi
-0.66
resembling
-0.66
Includes
-0.65
POSITIVE LOGITS
oret
1.65
resa
1.34
downside
1.29
easiest
1.22
reason
1.21
irony
1.20
biggest
1.19
ories
1.18
simplest
1.18
implication
1.15
Activations Density 0.450%