INDEX
Explanations
names of individuals
proper nouns, specifically names of people
New Auto-Interp
Negative Logits
cake
-0.65
OPS
-0.62
Sussex
-0.61
Legion
-0.60
Cure
-0.60
ually
-0.60
Blaze
-0.60
jack
-0.60
Word
-0.60
FW
-0.59
POSITIVE LOGITS
quist
1.19
gren
1.17
sky
1.05
kson
0.99
qv
0.90
enegger
0.90
hetti
0.90
chuk
0.89
afort
0.89
ramid
0.88
Activations Density 0.022%