INDEX
Explanations
numbers in different languages
New Auto-Interp
Negative Logits
Rajas
0.84
rebel
0.72
Rebel
0.66
prospect
0.65
Someone
0.65
myst
0.65
낮은
0.64
inclination
0.64
τ
0.63
Prospect
0.63
POSITIVE LOGITS
೬
0.88
twos
0.86
дву
0.86
ẵn
0.86
dois
0.85
четыре
0.85
kahe
0.85
२४
0.84
deux
0.83
quatre
0.82
Activations Density 0.309%