INDEX
Explanations
names of prominent individuals
New Auto-Interp
Negative Logits
etas
-0.16
himself
-0.16
Levin
-0.15
Woo
-0.15
ÑĥÑģ
-0.15
Wo
-0.14
rist
-0.14
ead
-0.14
esis
-0.14
gentlemen
-0.14
POSITIVE LOGITS
#
0.17
jeme
0.16
twig
0.15
odata
0.15
icina
0.15
herself
0.14
âĢ¢↵↵
0.14
indow
0.14
ghest
0.14
inth
0.14
Activations Density 0.060%