INDEX
Explanations
references to nutrition or nutritious foods
New Auto-Interp
Negative Logits
esty
-0.19
hem
-0.17
aign
-0.16
haf
-0.15
stral
-0.15
ãĥ¼ãĥŃ
-0.15
hood
-0.15
524
-0.15
yers
-0.15
otron
-0.14
POSITIVE LOGITS
ritional
0.33
meg
0.32
ty
0.30
rients
0.24
ters
0.24
rition
0.23
rient
0.22
shell
0.22
job
0.22
ted
0.21
Activations Density 0.007%