INDEX
Explanations
phrases related to political debates or national security issues
New Auto-Interp
Negative Logits
sbm
-1.06
Smy
-0.78
cffffcc
-0.73
yip
-0.72
isse
-0.71
tele
-0.71
isexual
-0.71
aren
-0.68
qus
-0.67
hap
-0.65
POSITIVE LOGITS
plate
0.89
plates
0.81
ames
0.80
recognition
0.78
wash
0.77
urance
0.70
Nemesis
0.69
lessness
0.68
WAY
0.68
Saud
0.65
Activations Density 0.012%