INDEX
Explanations
references to literary works or books in various contexts
references to religious texts and their titles
New Auto-Interp
Negative Logits
ilitary
-0.94
asio
-0.81
orescence
-0.79
sclerosis
-0.70
corrosion
-0.69
ilty
-0.69
00200000
-0.64
democracy
-0.62
adow
-0.62
undai
-0.61
POSITIVE LOGITS
stores
1.24
marks
1.18
Book
1.13
book
1.10
mark
1.09
she
1.05
store
1.02
Book
1.02
worm
1.01
seller
0.96
Activations Density 0.019%