INDEX
Explanations
phrases indicating belief or conviction
New Auto-Interp
Negative Logits
ses
-0.17
uell
-0.16
quel
-0.15
yny
-0.15
lal
-0.15
iere
-0.15
filer
-0.14
wich
-0.14
ercul
-0.14
ega
-0.14
POSITIVE LOGITS
fulness
0.15
worth
0.15
fully
0.15
lessly
0.15
618
0.15
اÙĨس
0.15
-bel
0.15
ROUP
0.14
ances
0.14
608
0.14
Activations Density 0.052%