INDEX
Explanations
names of famous individuals
proper nouns, particularly personal names and familial relationships
New Auto-Interp
Negative Logits
LEVEL
-0.72
pmwiki
-0.71
tracking
-0.71
Borderlands
-0.68
arbitration
-0.67
dystopian
-0.65
merit
-0.65
CONCLUS
-0.65
sampling
-0.65
polar
-0.63
POSITIVE LOGITS
mie
0.98
hyde
0.96
Jr
0.95
abeth
0.91
andro
0.88
nie
0.88
mi
0.87
ilde
0.87
lynn
0.87
anne
0.86
Activations Density 0.193%