INDEX
Explanations
names of locations or organizations, specifically those starting with "New"
instances of the word "New."
New Auto-Interp
Negative Logits
malt
-0.70
hurt
-0.67
tram
-0.64
stall
-0.63
kin
-0.61
orcs
-0.61
lean
-0.60
proxies
-0.60
FIG
-0.60
protein
-0.60
POSITIVE LOGITS
New
3.46
new
2.16
New
2.14
NEW
2.14
NEW
1.72
Old
1.51
Los
1.37
NY
1.27
Chicago
1.22
Philadelphia
1.20
Activations Density 0.021%