INDEX
Explanations
proper nouns, particularly personal names and names of entities related to Japan
New Auto-Interp
Negative Logits
appa
-0.17
moth
-0.17
άλι
-0.16
ekl
-0.16
_nsec
-0.15
vara
-0.15
ansa
-0.15
ephir
-0.15
asa
-0.14
ismatic
-0.14
POSITIVE LOGITS
oney
0.17
0.16
åĦĢ
0.15
ogue
0.14
Ell
0.14
wald
0.14
ess
0.14
ire
0.14
opoulos
0.14
cent
0.14
Activations Density 0.067%