INDEX
Explanations
Options, particularly former/latter
New Auto-Interp
Negative Logits
0.45
ossi
0.40
0.40
AN
0.39
”;
0.39
reter
0.39
ළ
0.38
0.37
ct
0.37
my
0.36
POSITIVE LOGITS
latter
0.79
后者
0.75
前者
0.63
vooral
0.57
particularly
0.57
někter
0.56
especially
0.55
особенно
0.55
niektor
0.55
cualquiera
0.52
Activations Density 0.258%