INDEX
Explanations
adverbs of manner and extent
New Auto-Interp
Negative Logits
of
0.69
ruas
0.63
to
0.62
ahli
0.60
២
0.59
luas
0.55
ﺍ
0.55
keber
0.55
akses
0.55
neuen
0.55
POSITIVE LOGITS
에서도
0.47
,
0.47
에도
0.46
М
0.44
पणे
0.43
7
0.41
료
0.41
،
0.40
С
0.40
they
0.39
Activations Density 0.502%