INDEX
Explanations
words related to proper nouns, specifically names
repeated mentions of a specific name
New Auto-Interp
Negative Logits
IBLE
-0.87
ãĥ¼ãĥ³
-0.80
ACTED
-0.79
ISC
-0.75
ources
-0.73
اÙĦ
-0.72
Gutenberg
-0.70
Reaper
-0.68
ãĥ¯
-0.68
FEMA
-0.67
POSITIVE LOGITS
loo
1.02
ky
1.01
ansky
0.86
aky
0.81
kees
0.81
lip
0.81
usha
0.81
pton
0.80
aku
0.79
Ky
0.79
Activations Density 0.015%