INDEX
Explanations
phrases related to government
phrases that indicate a focus or emphasis on specific groups or demographics
New Auto-Interp
Head Attr Weights
0:0.04
1:0.02
2:0.13
3:0.06
4:0.28
5:0.04
6:0.02
7:0.02
8:0.21
9:0.07
10:0.03
11:0.02
Negative Logits
nown
-1.44
famous
-1.41
footed
-1.41
minus
-1.40
formed
-1.39
brother
-1.35
transfer
-1.35
Reviewer
-1.33
fax
-1.32
Posted
-1.32
POSITIVE LOGITS
insulting
1.44
ggle
1.42
cram
1.38
maximum
1.34
destro
1.32
ezvous
1.30
damaging
1.21
profiling
1.21
compromising
1.19
Gillespie
1.19
Activations Density 0.008%