INDEX
Negative Logits
Origins
0.66
₂+
0.63
auxilia
0.63
zsche
0.63
benefits
0.63
benefici
0.63
观测
0.62
beneficial
0.61
amén
0.61
aiding
0.61
POSITIVE LOGITS
NEVER
1.37
PLEASE
1.36
DO
1.33
NO
1.32
MOST
1.29
BEFORE
1.28
STOP
1.27
ALWAYS
1.24
DON
1.23
WE
1.22
Activations Density 0.372%