INDEX
Explanations
names of stories and books in a review context
New Auto-Interp
Negative Logits
VP
-0.16
vp
-0.14
vp
-0.14
taire
-0.14
eto
-0.13
Frid
-0.13
Picasso
-0.13
jab
-0.13
alu
-0.13
oa
-0.13
POSITIVE LOGITS
nov
0.32
novel
0.30
novels
0.30
nov
0.24
books
0.24
volumes
0.24
book
0.23
series
0.23
volume
0.22
Novel
0.22
Activations Density 0.155%