INDEX
Explanations
names of specific people
New Auto-Interp
Negative Logits
ashore
-0.71
Siren
-0.65
Clash
-0.62
dule
-0.62
izational
-0.60
abeth
-0.57
Dragonbound
-0.57
Carmen
-0.57
verson
-0.56
braces
-0.56
POSITIVE LOGITS
eatures
1.05
ortun
1.02
lex
0.97
unction
0.97
ornia
0.95
lect
0.95
req
0.93
ruit
0.90
icient
0.88
rame
0.88
Activations Density 0.014%