INDEX
Explanations
positive opinions about things
statements about personal favorites or notable experiences in films or music
New Auto-Interp
Negative Logits
iot
-0.75
iership
-0.68
FAQ
-0.67
angering
-0.66
threat
-0.63
wake
-0.63
ammers
-0.62
usercontent
-0.62
encies
-0.61
ught
-0.61
POSITIVE LOGITS
definitely
1.22
certainly
1.06
undoubtedly
1.04
probably
1.02
arguably
0.96
obviously
0.88
basically
0.87
perhaps
0.86
undeniably
0.85
another
0.83
Activations Density 0.309%