INDEX
Explanations
positive words or phrases
expressions of positive sentiment
New Auto-Interp
Negative Logits
loo
-0.73
Hearts
-0.69
HAEL
-0.69
wine
-0.69
opsy
-0.68
Brilliant
-0.67
ARDS
-0.64
AMES
-0.64
doms
-0.63
spo
-0.63
POSITIVE LOGITS
itional
1.63
itions
1.52
itivity
1.30
idon
1.21
itor
1.15
itive
1.15
itionally
1.15
itives
1.14
itory
1.13
icion
1.13
Activations Density 0.059%