INDEX
Explanations
mentions of the word "water"
references to water and its related contexts
New Auto-Interp
Negative Logits
Reloaded
-0.92
ablishment
-0.87
ures
-0.81
eering
-0.80
eways
-0.72
ificant
-0.71
urities
-0.66
doms
-0.66
urations
-0.66
eki
-0.65
POSITIVE LOGITS
melon
1.63
colour
1.12
proof
0.98
loo
0.97
marked
0.92
fluor
0.92
color
0.90
tight
0.90
falls
0.89
buffalo
0.89
Activations Density 0.024%