INDEX
Explanations
negations and denials in statements
New Auto-Interp
Negative Logits
etto
-0.16
idders
-0.15
огÑĢаÑĦ
-0.14
wakeup
-0.14
isine
-0.14
etti
-0.14
.motion
-0.14
irth
-0.14
ja
-0.13
ány
-0.13
POSITIVE LOGITS
ndl
0.18
umas
0.17
γά
0.16
fts
0.15
Rout
0.15
yet
0.15
ropol
0.15
nad
0.14
remium
0.14
BK
0.14
Activations Density 0.228%