INDEX
Explanations
proper nouns
mentions of the name "Waters."
New Auto-Interp
Negative Logits
oard
-0.71
ERAL
-0.70
bable
-0.70
unal
-0.70
centralized
-0.70
ĻĤ
-0.67
Haiti
-0.64
prisoner
-0.64
uster
-0.64
OTS
-0.63
POSITIVE LOGITS
Waters
1.45
waters
0.95
melon
0.94
cape
0.87
geist
0.87
combe
0.86
boro
0.86
hed
0.83
water
0.81
Cry
0.81
Activations Density 0.004%