INDEX
Explanations
proper nouns, particularly names of individuals and organizations
New Auto-Interp
Negative Logits
607
-0.15
wers
-0.14
upe
-0.14
æľĭ
-0.14
erts
-0.14
usc
-0.14
µ
-0.13
erre
-0.13
IFn
-0.13
ivre
-0.13
POSITIVE LOGITS
kas
0.16
RICT
0.14
Colon
0.14
fug
0.14
fried
0.13
aging
0.13
stagram
0.13
æĥ
0.13
rea
0.12
COMMENTS
0.12
Activations Density 0.075%