INDEX
Explanations
instances of negation or lack of action
New Auto-Interp
Negative Logits
rencont
-0.17
imoto
-0.15
odox
-0.14
abor
-0.14
indrome
-0.14
---</
-0.14
/INFO
-0.14
agoon
-0.14
ustos
-0.13
apus
-0.13
POSITIVE LOGITS
ing
0.15
of
0.15
iqu
0.14
rix
0.13
he
0.12
/renderer
0.12
zar
0.12
ÑĢÑĸп
0.12
’t
0.12
Curtis
0.12
Activations Density 1.935%