INDEX
Negative Logits
saying
1.17
getting
0.96
dizendo
0.94
gonna
0.93
anybody
0.90
going
0.89
everytime
0.88
letting
0.87
say
0.86
tikai
0.85
POSITIVE LOGITS
Indeed
0.85
Unlike
0.78
Indeed
0.74
כמ
0.72
unlike
0.72
of
0.70
cleanly
0.70
unächst
0.69
unlike
0.68
Of
0.67
Activations Density 0.016%