INDEX
Explanations
mentions of the city of Phoenix
references to the city of Phoenix
New Auto-Interp
Negative Logits
ischer
-0.77
abet
-0.77
hirt
-0.76
redo
-0.72
ntil
-0.71
oys
-0.70
flows
-0.69
arger
-0.68
apter
-0.68
uther
-0.67
POSITIVE LOGITS
Coyotes
1.17
Suns
1.08
Wright
0.97
Rising
0.90
Phoenix
0.88
Phoenix
0.84
angel
0.82
AZ
0.78
oning
0.76
ertodd
0.74
Activations Density 0.009%