INDEX
Explanations
st, tr, ch, th, br prefixes
New Auto-Interp
Negative Logits
.
0.40
ний
0.40
↵
0.39
ährt
0.37
➰
0.36
वैसे
0.36
गतान
0.36
πολύ
0.35
není
0.35
dikenal
0.34
POSITIVE LOGITS
c
0.54
د
0.54
it
0.51
ال
0.42
embeddings
0.42
confir
0.41
ból
0.41
ก
0.41
ام
0.40
ت
0.40
Activations Density 0.059%