INDEX
Explanations
mentions of specific groups or individuals in the context of news or politics
references to service members
New Auto-Interp
Negative Logits
Nichols
-0.72
Bauer
-0.70
hee
-0.67
=-=-=-=-=-=-=-=-
-0.66
Buffy
-0.66
Robbie
-0.65
bitterness
-0.65
ABV
-0.65
HIP
-0.64
Ryder
-0.64
POSITIVE LOGITS
emen
1.60
ghan
1.02
ovic
0.93
folk
0.93
ufact
0.90
estyles
0.89
ilitation
0.88
isphere
0.87
pread
0.86
ople
0.85
Activations Density 0.003%