INDEX
Explanations
adjectives or nouns describing strong emotions or opinions
sentiments related to strong emotions and preferences
New Auto-Interp
Negative Logits
helicop
-0.73
mans
-0.66
err
-0.66
conduct
-0.63
stages
-0.62
ghazi
-0.62
gow
-0.62
hematically
-0.60
engers
-0.60
Appendix
-0.60
POSITIVE LOGITS
lessness
0.89
toward
0.89
towards
0.83
rence
0.80
iness
0.77
thirst
0.77
itiveness
0.76
aroused
0.76
acy
0.75
fascination
0.73
Activations Density 0.152%