INDEX
Explanations
references to food and beverages
New Auto-Interp
Negative Logits
ors
-0.18
ept
-0.16
ante
-0.15
egg
-0.15
sol
-0.15
itre
-0.14
eping
-0.14
akes
-0.14
ORS
-0.14
nee
-0.14
POSITIVE LOGITS
stuff
0.32
ie
0.26
borne
0.23
service
0.22
ies
0.20
gie
0.20
poisoning
0.20
st
0.19
IE
0.19
IES
0.18
Activations Density 0.044%