INDEX
Negative Logits
yourselves
-0.98
yourself
-0.87
yours
-0.73
Your
-0.73
Yourself
-0.73
himſelf
-0.72
deafness
-0.72
SharedDtor
-0.71
adpleegd
-0.70
your
-0.69
POSITIVE LOGITS
they
0.52
')->
0.48
}();
0.44
They
0.43
pick
0.42
mereka
0.42
躇
0.42
they
0.41
Figure
0.41
neer
0.41
Activations Density 0.147%