INDEX
Explanations
terms following specific words
New Auto-Interp
Negative Logits
Па
0.90
<eos>
0.84
На
0.83
Мо
0.83
Та
0.82
По
0.82
За
0.80
Ви
0.78
Не
0.78
До
0.76
POSITIVE LOGITS
fundraising
1.13
laryng
1.02
är
1.01
antisemit
1.01
è
1.00
softball
0.99
melakukan
0.98
talks
0.98
sarà
0.98
outperformed
0.97
Activations Density 0.001%