INDEX
Explanations
references to novels and their attributes
New Auto-Interp
Negative Logits
fully
-0.19
aan
-0.17
wards
-0.16
ed
-0.16
yor
-0.16
fulness
-0.15
àµįà´
-0.15
543
-0.15
ugu
-0.14
%B
-0.14
POSITIVE LOGITS
-length
0.28
ists
0.25
ty
0.25
istic
0.25
ization
0.24
lette
0.24
izations
0.23
ized
0.23
isation
0.23
ised
0.21
Activations Density 0.014%