INDEX
Explanations
adjectives describing the quality or impact of a situation
expressions of positive or negative outcomes related to policies and news
New Auto-Interp
Negative Logits
racuse
-0.82
hyde
-0.79
opers
-0.76
letes
-0.72
avorite
-0.71
uckle
-0.70
onds
-0.70
agos
-0.70
lete
-0.67
irez
-0.65
POSITIVE LOGITS
news
1.28
publicity
1.17
manners
0.98
news
0.95
bye
0.94
luck
0.94
enough
0.92
optics
0.92
karma
0.91
NEWS
0.89
Activations Density 0.092%