INDEX
Explanations
words describing positive or negative experiences and emotions
New Auto-Interp
Negative Logits
aucus
-0.69
arians
-0.63
aan
-0.62
arius
-0.61
inition
-0.61
Feder
-0.61
ARS
-0.60
govtrack
-0.59
ARE
-0.59
FIL
-0.59
POSITIVE LOGITS
ries
1.29
surprises
1.04
ness
1.01
smelling
1.00
lihood
0.98
ties
0.96
pleasant
0.90
nesses
0.89
ments
0.85
surpr
0.81
Activations Density 0.013%