INDEX
Explanations
words related to politics, policies, and societal issues
New Auto-Interp
Negative Logits
VIDIA
-0.59
WARD
-0.58
juven
-0.57
confir
-0.56
pestic
-0.55
enegger
-0.54
horm
-0.52
elaborated
-0.52
tiss
-0.52
thumbnails
-0.52
POSITIVE LOGITS
shine
0.67
afloat
0.66
obsolete
0.60
accountable
0.59
aside
0.59
onto
0.58
into
0.57
away
0.55
mates
0.55
neys
0.55
Activations Density 5.752%