INDEX
Explanations
phrases indicating high and low quantities in comparisons
New Auto-Interp
Negative Logits
121
-0.16
anny
-0.16
argo
-0.16
á»IJ
-0.15
ULO
-0.15
_fmt
-0.14
ren
-0.14
ward
-0.13
UX
-0.13
aq
-0.13
POSITIVE LOGITS
ovi
0.15
lest
0.14
borough
0.14
bounce
0.14
roken
0.14
licken
0.14
Ïģί
0.14
Nar
0.13
=sum
0.13
åĩĮ
0.13
Activations Density 0.021%