INDEX
Explanations
references to and descriptions of characters in stories
New Auto-Interp
Negative Logits
yg
-0.74
VERTIS
-0.72
sterdam
-0.69
gard
-0.68
rup
-0.68
IDA
-0.64
VEN
-0.63
ven
-0.63
yden
-0.62
LOCK
-0.62
POSITIVE LOGITS
acters
1.55
istically
1.38
istics
1.34
izations
1.06
isations
0.93
izes
0.90
arcs
0.88
assassination
0.87
assassinate
0.86
traits
0.84
Activations Density 0.620%