INDEX
Explanations
mentions of characters and their roles in narratives
New Auto-Interp
Negative Logits
seo
-0.17
day
-0.16
ese
-0.16
ew
-0.15
amer
-0.15
arious
-0.15
orget
-0.15
yan
-0.15
ÑĢа
-0.15
oyo
-0.15
POSITIVE LOGITS
istically
0.36
izations
0.27
istics
0.25
isation
0.24
istik
0.24
izing
0.22
itics
0.21
ised
0.21
izes
0.20
ized
0.20
Activations Density 0.035%