INDEX
Negative Logits
themſelves
-0.94
Theſe
-0.92
Houſe
-0.88
himſelf
-0.85
itſelf
-0.85
Diſ
-0.81
ſmall
-0.80
myſelf
-0.79
whom
-0.79
houſe
-0.77
POSITIVE LOGITS
ever
0.86
is
0.79
,
0.66
was
0.63
e
0.62
we
0.60
has
0.59
se
0.57
continues
0.56
Begriffsklä
0.56
Activations Density 0.034%