INDEX
Explanations
SU, BE, DE, SE, MO, AN, TR, MO, EX, SO
New Auto-Interp
Negative Logits
ş
0.49
ătoare
0.47
şk
0.44
țin
0.43
ătur
0.42
ị
0.41
ပိုင်း
0.41
ón
0.40
ovací
0.40
nički
0.39
POSITIVE LOGITS
ORY
0.63
RE
0.62
AN
0.61
ER
0.61
UR
0.59
ON
0.59
IN
0.59
ATT
0.59
INA
0.57
AR
0.55
Activations Density 0.031%