INDEX
Explanations
names or proper nouns
suffixes or fragments of words
New Auto-Interp
Negative Logits
Stella
-0.63
Paula
-0.60
Nar
-0.59
Vengeance
-0.58
faire
-0.57
BUR
-0.54
Psycho
-0.54
Lily
-0.54
CAP
-0.54
Pix
-0.53
POSITIVE LOGITS
enegger
1.12
himself
1.04
testified
0.98
oversaw
0.90
wrote
0.88
's
0.87
penned
0.87
baum
0.85
told
0.85
specializes
0.84
Activations Density 0.166%