INDEX
Explanations
political discussions and financial issues
New Auto-Interp
Negative Logits
ÑĥÑĩа
-0.15
konus
-0.13
simply
-0.13
agma
-0.12
undan
-0.12
nab
-0.12
Bend
-0.12
Ãłng
-0.12
olet
-0.11
acias
-0.11
POSITIVE LOGITS
not
0.91
NOT
0.75
not
0.75
Not
0.60
Not
0.57
-not
0.57
not
0.56
NOT
0.54
_not
0.52
.not
0.50
Activations Density 0.405%