INDEX
Explanations
references to characters within text
references to character design or traits in narratives
New Auto-Interp
Negative Logits
æĸ¹
-0.73
ynthesis
-0.70
yip
-0.69
arcity
-0.66
kaya
-0.63
akening
-0.61
Accessory
-0.60
allows
-0.59
aughters
-0.59
owers
-0.59
POSITIVE LOGITS
char
0.99
acters
0.96
itably
0.95
quer
0.79
ret
0.78
lie
0.76
\\\\\\\\
0.76
coon
0.75
isma
0.75
rette
0.74
Activations Density 0.006%