INDEX
Explanations
recommendations and suggestions
New Auto-Interp
Negative Logits
à¥ģà¤Ŀ
-0.16
quin
-0.15
how
-0.15
ÑĢеб
-0.15
utin
-0.14
about
-0.13
оваÑĤÑĮÑģÑı
-0.13
pping
-0.13
aldi
-0.13
ÑĢаÑģ
-0.13
POSITIVE LOGITS
against
0.25
Against
0.20
you
0.20
against
0.19
strongly
0.19
everyone
0.19
Against
0.18
/request
0.17
avoiding
0.17
anyone
0.17
Activations Density 0.055%