INDEX
Explanations
phrases related to expressing concerns
instances of the contraction "don't."
New Auto-Interp
Negative Logits
vulner
-0.50
pyramid
-0.45
sacrific
-0.43
scattering
-0.42
Antar
-0.42
convenience
-0.42
jog
-0.41
barr
-0.41
surv
-0.41
exha
-0.41
POSITIVE LOGITS
ï¸ı
0.97
_>
0.63
east
0.62
âĤ¬
0.62
£
0.61
¯
0.61
tre
0.60
âĻ
0.60
$
0.59
reci
0.59
Activations Density 0.528%