INDEX
Explanations
negative judgments and criticisms related to behavior and ethics
New Auto-Interp
Negative Logits
благодаÑĢÑı
-0.19
thanks
-0.18
grâce
-0.16
thanks
-0.16
lia
-0.15
nhá»Ŀ
-0.15
unya
-0.14
WithURL
-0.14
sẵn
-0.14
اÙĦÛĮا
-0.14
POSITIVE LOGITS
considering
0.25
border
0.20
beyond
0.19
indeed
0.18
especially
0.17
to
0.17
border
0.16
borders
0.15
behavior
0.15
Considering
0.15
Activations Density 0.146%