INDEX
Explanations
specific names and terms associated with people, places, or titles
New Auto-Interp
Negative Logits
rou
-0.17
istrovstvÃŃ
-0.16
ower
-0.15
ãĤ
-0.15
ahoma
-0.15
κι
-0.14
plication
-0.14
gings
-0.14
ings
-0.14
ách
-0.14
POSITIVE LOGITS
osate
0.20
zelf
0.20
xs
0.18
Ø©
0.17
ele
0.17
ะ
0.17
embre
0.15
zsche
0.15
zo
0.15
annis
0.15
Activations Density 0.572%