INDEX
Negative Logits
ри
0.99
0.90
ani
0.83
ä
0.79
ine
0.79
í
0.75
haven
0.71
ulation
0.68
raz
0.66
wash
0.66
POSITIVE LOGITS
B
1.05
8
1.05
К
1.03
D
0.98
6
0.96
G
0.95
K
0.94
L
0.92
5
0.91
Ме
0.89
Activations Density 0.004%
ри
ani
ä
ine
í
haven
ulation
raz
wash
B
8
К
D
6
G
K
L
5
Ме