INDEX
Explanations
names of people or characters associated with historical or fictional contexts
New Auto-Interp
Negative Logits
aille
-0.16
yd
-0.16
Jacqu
-0.16
VERT
-0.15
hood
-0.15
FFFFFFFF
-0.15
ffffffff
-0.15
hots
-0.14
rede
-0.14
.sd
-0.14
POSITIVE LOGITS
zen
0.55
ze
0.54
zer
0.49
z
0.48
zes
0.45
za
0.44
zs
0.43
zt
0.42
zi
0.41
zo
0.41
Activations Density 0.038%