INDEX
Explanations
Japanese, Korean, French, Spanish
New Auto-Interp
Negative Logits
İK
0.39
𝐘
0.35
İN
0.35
ساين
0.35
㸸
0.35
𝐓
0.34
唓
0.34
⤥
0.34
SCHRAMM
0.34
𝗗
0.34
POSITIVE LOGITS
k
0.40
h
0.38
h
0.37
iy
0.36
y
0.35
ma
0.35
na
0.35
ka
0.35
t
0.34
ku
0.34
Activations Density 0.155%