INDEX
Explanations
references to literary works and their authors
New Auto-Interp
Negative Logits
peare
-0.15
LLL
-0.15
Script
-0.14
Poetry
-0.14
Shown
-0.14
logan
-0.14
zan
-0.14
御
-0.13
Viewer
-0.13
ua
-0.13
POSITIVE LOGITS
novel
0.44
Novel
0.38
novels
0.35
nov
0.33
novelist
0.26
nov
0.25
å°ı说
0.25
Nov
0.24
ovel
0.20
Booker
0.19
Activations Density 0.224%