INDEX
Explanations
instances of phrases indicating belief and trust
New Auto-Interp
Negative Logits
legate
-0.15
TERS
-0.14
ignment
-0.14
legates
-0.14
agment
-0.14
mium
-0.14
enade
-0.14
lf
-0.13
anes
-0.13
afi
-0.13
POSITIVE LOGITS
?
0.16
underst
0.16
etheless
0.15
though
0.15
prisingly
0.14
umably
0.14
identally
0.14
ÙĮ
0.14
pecially
0.14
theless
0.14
Activations Density 0.269%