INDEX
Explanations
political and news-related terms and phrases
New Auto-Interp
Negative Logits
xual
-0.72
ones
-0.67
TEXTURE
-0.67
76561
-0.65
tions
-0.64
:-
-0.64
AAF
-0.62
Anon
-0.61
lihood
-0.60
naires
-0.59
POSITIVE LOGITS
smartest
0.66
celeb
0.65
anooga
0.62
Inside
0.61
digest
0.59
inion
0.58
Kavanaugh
0.57
noon
0.57
uncover
0.57
akespeare
0.57
Activations Density 0.112%