INDEX
Explanations
instances of the word 'salt', with a particularly high activation for the exact word 'salt'
occurrences of the word "salt."
New Auto-Interp
Negative Logits
merce
-0.98
vernment
-0.82
mercial
-0.74
GROUP
-0.67
ITNESS
-0.66
hler
-0.66
WARN
-0.66
STEP
-0.65
ufact
-0.63
GS
-0.63
POSITIVE LOGITS
water
1.13
zman
0.95
bite
0.93
illo
0.91
baths
0.89
iness
0.87
osal
0.85
isbury
0.83
Salt
0.83
iod
0.82
Activations Density 0.016%