INDEX
Explanations
references to public parks
mentions of parks
New Auto-Interp
Negative Logits
dilig
-0.72
rontal
-0.70
xit
-0.69
decomp
-0.68
ptoms
-0.67
nces
-0.64
initions
-0.63
theless
-0.62
soever
-0.62
oxid
-0.62
POSITIVE LOGITS
park
1.02
park
0.98
ranger
0.85
hurst
0.84
wright
0.82
our
0.80
conservancy
0.80
keeper
0.79
way
0.77
keeping
0.76
Activations Density 0.014%