INDEX
Explanations
words related to baking or cooking
segments of text related to specific labels or categorization
New Auto-Interp
Negative Logits
Day
-0.79
Inc
-0.78
Rac
-0.76
Cove
-0.76
Ranked
-0.74
Disorder
-0.70
Poll
-0.70
Shades
-0.69
Year
-0.69
Parks
-0.69
POSITIVE LOGITS
iest
1.21
liest
1.06
osphere
1.02
hest
1.01
portion
0.91
hypothesis
0.90
agonist
0.89
omial
0.85
ultimate
0.85
ciation
0.85
Activations Density 0.492%