INDEX
Explanations
mentions of romance novels and related literature themes
New Auto-Interp
Negative Logits
868
-0.19
rame
-0.15
rious
-0.15
Gram
-0.15
payday
-0.14
unken
-0.14
_KERNEL
-0.14
pter
-0.14
.TestTools
-0.14
865
-0.14
POSITIVE LOGITS
Faul
0.15
OutOfBounds
0.15
adt
0.15
磨
0.14
phinx
0.14
akh
0.14
acas
0.14
audi
0.14
arden
0.13
-series
0.13
Activations Density 0.159%