INDEX
Explanations
mentions of books
references to books and their accessibility
New Auto-Interp
Negative Logits
00200000
-0.74
Yin
-0.70
alty
-0.67
Normandy
-0.66
saline
-0.63
Security
-0.63
Vital
-0.63
Bots
-0.62
assis
-0.62
ntil
-0.60
POSITIVE LOGITS
stores
1.33
marks
1.12
books
1.11
worms
1.06
hel
1.05
books
0.97
shop
0.97
poons
0.95
book
0.94
marked
0.93
Activations Density 0.028%