INDEX
Explanations
positive assessments of books and written works
New Auto-Interp
Negative Logits
repr
-0.15
happier
-0.15
fo
-0.15
uring
-0.15
glorious
-0.14
rebut
-0.14
ailable
-0.13
succinct
-0.13
beloved
-0.13
Radi
-0.13
POSITIVE LOGITS
informative
0.28
Inform
0.24
inform
0.23
informat
0.23
Inform
0.23
informational
0.22
enlight
0.22
eye
0.22
instruct
0.21
educational
0.20
Activations Density 0.192%