INDEX
Explanations
references to characters and their traits in narratives
New Auto-Interp
Negative Logits
ery
-0.17
air
-0.17
.sharedInstance
-0.16
enas
-0.16
erton
-0.16
seo
-0.16
chner
-0.15
ADA
-0.15
iam
-0.15
tures
-0.14
POSITIVE LOGITS
istically
0.27
izations
0.22
isation
0.21
ised
0.19
ually
0.18
ized
0.18
pent
0.18
ize
0.18
nels
0.17
izing
0.16
Activations Density 0.037%