INDEX
Explanations
party, whole, assignment, Fed
New Auto-Interp
Negative Logits
an
1.03
나
0.94
se
0.93
К
0.91
I
0.89
Мо
0.87
ed
0.87
entric
0.84
z
0.83
라
0.82
POSITIVE LOGITS
tassels
0.85
тились
0.84
quilts
0.84
cucumbers
0.83
chestnuts
0.83
screws
0.83
giraffe
0.82
twigs
0.80
acorns
0.79
("^0.79
Activations Density 0.001%