INDEX
Negative Logits
imbalances
0.36
Obes
0.34
blatantly
0.33
antiated
0.32
черво
0.31
разнови
0.30
raged
0.30
다음과
0.30
:',
0.30
falsch
0.30
POSITIVE LOGITS
👌
0.54
👍
0.52
!
0.51
👍
0.51
👌
0.49
choice
0.48
🙌
0.43
для
0.42
🙌
0.41
surv
0.41
Activations Density 0.061%