INDEX
Explanations
descriptive details and interactions among characters in a narrative
New Auto-Interp
Negative Logits
ciples
-0.77
Ľ
-0.76
uously
-0.69
orah
-0.68
istics
-0.68
orians
-0.66
asters
-0.66
ATURES
-0.65
ormons
-0.64
oby
-0.63
POSITIVE LOGITS
pesky
1.15
kind
0.92
same
0.86
particular
0.85
sort
0.84
fateful
0.83
elusive
0.80
cher
0.77
type
0.76
sucker
0.72
Activations Density 14.074%