INDEX
Explanations
negative phrases related to approvals or conditions
New Auto-Interp
Negative Logits
Ŀ
-0.18
peri
-0.15
ÙĬز
-0.14
å£
-0.14
croft
-0.14
æľ¯
-0.14
ÛĮز
-0.14
Äijá»Ŀi
-0.14
osaurs
-0.14
ạc
-0.14
POSITIVE LOGITS
latter
0.18
ouro
0.17
itself
0.15
MAD
0.15
Ging
0.15
ENCE
0.15
indeed
0.15
meaning
0.15
himself
0.14
wash
0.14
Activations Density 0.149%