INDEX
Explanations
positive descriptors, particularly the word "nice" and its variations
New Auto-Interp
Negative Logits
soever
-0.18
greatness
-0.17
tes
-0.17
slightest
-0.16
/Branch
-0.16
aries
-0.15
utes
-0.15
lan
-0.15
OM
-0.15
lu
-0.15
POSITIVE LOGITS
-looking
0.21
-sized
0.18
little
0.18
olson
0.17
nice
0.17
nice
0.17
surpr
0.16
енÑĮ
0.16
clean
0.16
surprises
0.16
Activations Density 0.021%