INDEX
Explanations
proper nouns referring to individuals
proper nouns and names
New Auto-Interp
Negative Logits
hyp
-0.79
Spectre
-0.76
crim
-0.73
ðĿ
-0.72
crim
-0.72
issan
-0.71
som
-0.70
cris
-0.70
oons
-0.69
trak
-0.69
POSITIVE LOGITS
Be
1.89
Be
1.65
BE
1.48
Beet
1.45
be
1.43
BE
1.42
Bee
1.42
Bey
1.41
Becker
1.41
Beard
1.38
Activations Density 0.254%