INDEX
Explanations
words related to food and culinary experiences
New Auto-Interp
Negative Logits
p
-0.28
pod
-0.22
pole
-0.21
an
-0.20
pard
-0.20
erde
-0.20
pour
-0.18
er
-0.18
b
-0.18
erin
-0.18
POSITIVE LOGITS
mers
0.29
bers
0.28
ming
0.26
blers
0.25
pty
0.24
plings
0.24
mi
0.23
ptions
0.23
mins
0.23
pte
0.22
Activations Density 0.025%