INDEX
Explanations
negations or words indicating the absence of something
New Auto-Interp
Negative Logits
же
-0.16
گرد
-0.16
berger
-0.16
ingu
-0.15
nor
-0.15
hores
-0.15
hoe
-0.15
ickerView
-0.15
eenth
-0.14
/***/
-0.14
POSITIVE LOGITS
ori
0.25
surprisingly
0.21
only
0.21
everyone
0.21
knowing
0.20
everything
0.20
sure
0.19
having
0.19
tingham
0.19
least
0.18
Activations Density 0.065%