INDEX
Explanations
assertive or positive statements about products or concepts
New Auto-Interp
Negative Logits
itr
-0.16
Reliable
-0.15
unforgettable
-0.14
æħİ
-0.14
éré
-0.14
Capability
-0.14
abbo
-0.14
otu
-0.14
doub
-0.14
usercontent
-0.14
POSITIVE LOGITS
interesting
0.30
interesting
0.28
Interesting
0.27
Interesting
0.25
particularly
0.25
especially
0.25
interess
0.23
particularly
0.22
fascinating
0.22
especially
0.21
Activations Density 0.018%