INDEX
Explanations
after punctuation delimiters
New Auto-Interp
Negative Logits
ق
0.75
q
0.73
ᆷ
0.60
3
0.59
ु
0.57
ᆯ
0.56
u
0.55
ik
0.54
ان
0.53
م
0.53
POSITIVE LOGITS
savaş
0.63
dakkh
0.63
psik
0.59
saddo
0.59
dvara
0.59
harmonize
0.58
gimnas
0.57
kadın
0.56
atlet
0.56
poin
0.55
Activations Density 0.000%