INDEX
Explanations
phrases that indicate initial evaluations or observations about a subject
New Auto-Interp
Negative Logits
iÄįky
-0.16
weiber
-0.15
outu
-0.15
instead
-0.15
istrovstvÃŃ
-0.15
follando
-0.14
IXEL
-0.14
вмеÑģÑĤ
-0.14
););↵
-0.14
yny
-0.14
POSITIVE LOGITS
alone
0.30
Alone
0.26
it
0.25
à¹ģล
0.23
alone
0.21
this
0.21
thì
0.21
nothing
0.20
yes
0.20
perhaps
0.19
Activations Density 0.049%