INDEX
Explanations
names of individuals
proper nouns, particularly names, predominately related to individuals
New Auto-Interp
Negative Logits
女
-0.69
MAC
-0.66
ulhu
-0.64
raped
-0.64
uously
-0.61
depend
-0.60
yip
-0.59
AST
-0.59
FANTASY
-0.56
Leia
-0.55
POSITIVE LOGITS
uner
0.79
ĸļ
0.73
enhagen
0.73
backer
0.71
runner
0.70
onson
0.70
zen
0.69
roth
0.69
nih
0.68
enberg
0.67
Activations Density 0.132%