INDEX
Explanations
names of famous people and places
proper nouns, specifically names of individuals
New Auto-Interp
Negative Logits
RIS
-0.82
tremend
-0.76
Corpus
-0.71
occas
-0.69
metic
-0.69
pione
-0.68
laun
-0.68
exting
-0.67
Citiz
-0.67
enthusi
-0.65
POSITIVE LOGITS
anyahu
0.96
imore
0.94
inson
0.92
rick
0.86
eret
0.86
ison
0.86
ridor
0.84
rake
0.84
cox
0.84
rigan
0.84
Activations Density 0.148%