INDEX
Explanations
proper nouns, particularly names and organizations
New Auto-Interp
Negative Logits
aned
-0.17
eview
-0.16
ÂŃi
-0.15
/animations
-0.15
udur
-0.15
rani
-0.14
Ñijл
-0.14
uentes
-0.13
endir
-0.13
ute
-0.13
POSITIVE LOGITS
elik
0.15
flix
0.15
chosen
0.14
.BLL
0.14
izr
0.14
Ricky
0.14
knight
0.13
ãĥķãĥĪ
0.13
rip
0.13
chosen
0.13
Activations Density 0.023%