INDEX
Explanations
the word "very" followed by positive or neutral expressions
expressions of strong emotions or sentiments
New Auto-Interp
Negative Logits
adelphia
-0.77
olor
-0.76
onis
-0.76
sburgh
-0.71
ourses
-0.68
osal
-0.67
zan
-0.67
heid
-0.67
Cheong
-0.66
ensis
-0.66
POSITIVE LOGITS
important
0.95
informative
0.91
difficult
0.91
rare
0.88
interesting
0.88
exciting
0.87
useful
0.85
unlikely
0.85
handy
0.84
readable
0.83
Activations Density 0.064%