INDEX
Negative Logits
themſelves
-0.87
ſtate
-0.87
complexContent
-0.78
hematical
-0.77
punishment
-0.77
ugeot
-0.77
poverty
-0.76
ſelves
-0.76
Дереккөздер
-0.76
purpoſe
-0.76
POSITIVE LOGITS
s
0.73
v
0.59
f
0.56
if
0.54
w
0.54
odo
0.53
in
0.52
r
0.52
ra
0.51
h
0.51
Activations Density 1.314%