INDEX
Explanations
positive affirmations or expressions of approval
New Auto-Interp
Negative Logits
greatness
-0.19
psilon
-0.15
ittest
-0.15
elez
-0.15
antics
-0.15
ors
-0.15
goodness
-0.14
ousel
-0.14
ively
-0.14
s
-0.14
POSITIVE LOGITS
reads
0.34
bye
0.31
night
0.31
-quality
0.30
ie
0.29
Samar
0.27
ol
0.26
ole
0.26
-sized
0.25
-news
0.25
Activations Density 0.088%