INDEX
Explanations
negative responses or denials
New Auto-Interp
Negative Logits
itzer
-0.19
/API
-0.17
loth
-0.15
898
-0.14
idle
-0.14
ulu
-0.14
riter
-0.14
ITO
-0.14
Reviewer
-0.14
API
-0.14
POSITIVE LOGITS
spiel
0.16
apter
0.15
venta
0.15
edList
0.15
ìį¨
0.15
matter
0.14
ãĥĥãĥĪ
0.14
areth
0.14
ool
0.14
ore
0.14
Activations Density 0.096%