INDEX
Explanations
references to specific characters or attributes in a narrative context
New Auto-Interp
Negative Logits
rib
-0.19
Äijô
-0.16
warz
-0.15
hek
-0.15
oir
-0.15
.ide
-0.14
åĸ
-0.14
esso
-0.14
iface
-0.14
Lesser
-0.14
POSITIVE LOGITS
bet
0.18
Cory
0.15
614
0.14
amam
0.14
yme
0.14
Gür
0.14
spy
0.14
p
0.14
919
0.14
Hopkins
0.13
Activations Density 0.001%