INDEX
Explanations
negative or contradictory statements
New Auto-Interp
Negative Logits
yük
-0.16
(rawValue
-0.16
inizi
-0.15
lain
-0.15
acer
-0.15
šov
-0.15
ланд
-0.14
Xd
-0.14
ارج
-0.14
è¡Ľ
-0.13
POSITIVE LOGITS
anymore
0.25
ched
0.21
necessarily
0.18
ori
0.18
particularly
0.18
any
0.18
ching
0.18
slightest
0.18
yet
0.18
exact
0.17
Activations Density 0.465%