INDEX
Explanations
references to characters and their traits in narratives
New Auto-Interp
Negative Logits
enberg
-0.19
ew
-0.18
ey
-0.17
wards
-0.16
ackers
-0.16
eko
-0.16
eni
-0.15
igkeit
-0.15
esh
-0.15
itzer
-0.15
POSITIVE LOGITS
istically
0.23
isation
0.20
ized
0.20
izations
0.19
untime
0.18
ised
0.18
nels
0.17
ization
0.16
amburger
0.15
istics
0.15
Activations Density 0.038%