INDEX
Explanations
references to the USA or related entities
references to the United States or its variations in spelling
New Auto-Interp
Negative Logits
rained
-0.96
dress
-0.85
croft
-0.84
rums
-0.81
bread
-0.80
smanship
-0.79
osc
-0.73
sheet
-0.71
clud
-0.71
lers
-0.71
POSITIVE LOGITS
BIL
1.06
terday
1.00
WAYS
0.96
BILITY
0.96
igslist
0.92
icago
0.86
ppa
0.84
velength
0.81
UNCH
0.81
xon
0.80
Activations Density 0.015%