INDEX
Explanations
positive information or highlights in text
phrases that convey positive news or highlights about various topics
New Auto-Interp
Negative Logits
ivalent
-0.74
indust
-0.70
adult
-0.70
heit
-0.70
20439
-0.70
igmatic
-0.68
ancies
-0.67
amental
-0.66
throp
-0.66
urch
-0.64
POSITIVE LOGITS
bonus
0.71
avoids
0.67
cures
0.66
additions
0.66
luckily
0.65
:]
0.64
rewards
0.64
overlooking
0.63
cushion
0.63
Bonus
0.62
Activations Density 0.166%