INDEX
Explanations
names of people
proper nouns, particularly names of individuals and specific entities
New Auto-Interp
Negative Logits
region
-0.69
wolves
-0.68
ories
-0.67
rals
-0.65
orical
-0.65
ĭ
-0.65
yrinth
-0.64
central
-0.64
runners
-0.64
dated
-0.63
POSITIVE LOGITS
Sr
1.28
Jr
1.16
III
0.98
Productions
0.91
aka
0.88
Presents
0.87
Returns
0.87
QC
0.85
(@
0.81
ovich
0.80
Activations Density 0.165%