INDEX
Explanations
words related to literature and novels
repeated references to the concept of a novel
New Auto-Interp
Negative Logits
tics
-0.70
cling
-0.70
adows
-0.70
ĪĴ
-0.68
disbanded
-0.66
respect
-0.65
olid
-0.65
aviour
-0.65
adow
-0.65
henko
-0.64
POSITIVE LOGITS
novel
1.17
ties
1.01
izations
0.93
novels
0.90
isations
0.90
sworth
0.87
Novel
0.83
nets
0.83
manuscript
0.80
screenplay
0.80
Activations Density 0.009%