INDEX
Explanations
terms related to naming and titles
New Auto-Interp
Negative Logits
amins
-0.16
engin
-0.16
Pron
-0.16
elman
-0.15
abbo
-0.15
ymes
-0.14
AFX
-0.14
ahn
-0.14
_TW
-0.14
MAP
-0.14
POSITIVE LOGITS
name
0.31
åIJį稱
0.25
åIJįç§°
0.25
åIJįåŃĹ
0.25
names
0.24
term
0.21
name
0.21
.name
0.21
tên
0.20
title
0.20
Activations Density 0.104%