INDEX
Explanations
proper nouns, especially related to political figures, locations, or organizations
New Auto-Interp
Negative Logits
pires
-0.72
ãĥĦ
-0.57
sleeps
-0.56
ãĥ¯ãĥ³
-0.54
ceases
-0.54
guiIcon
-0.53
ãĥīãĥ©
-0.52
guyen
-0.52
Ö¼
-0.52
ª
-0.50
POSITIVE LOGITS
respectively
1.36
apiece
1.24
themselves
0.85
whereas
0.83
nowadays
0.73
their
0.73
.
0.72
anyways
0.70
because
0.70
*.
0.70
Activations Density 0.840%