INDEX
Explanations
references to characters in a story
mentions of characters in narratives
New Auto-Interp
Negative Logits
¿½
-0.75
sterdam
-0.71
ntil
-0.69
aband
-0.69
ת
-0.68
ateurs
-0.66
ackers
-0.66
obar
-0.65
rup
-0.65
roxy
-0.65
POSITIVE LOGITS
acters
1.48
istically
1.13
istics
1.05
arcs
0.95
characters
0.91
Characters
0.88
izations
0.88
portray
0.85
portrayed
0.81
inhab
0.80
Activations Density 0.044%