INDEX
Explanations
mentions of food-related topics and consumption patterns
"food" followed by descriptive words
food, drug, water
New Auto-Interp
Negative Logits
]";
-1.01
)"),
-0.99
)");
-0.95
."]
-0.92
BibitemShut
-0.92
\"");
-0.92
']],
-0.91
."),
-0.91
])]
-0.90
."</
-0.90
POSITIVE LOGITS
-
0.66
y
0.63
z
0.63
w
0.63
H
0.62
f
0.62
b
0.61
P
0.60
man
0.59
k
0.59
Activations Density 0.577%