INDEX
Explanations
mentions of preservatives
references to specific individuals and their political affiliations
New Auto-Interp
Negative Logits
Lia
-0.70
WN
-0.70
extingu
-0.63
Clause
-0.63
fries
-0.62
Camel
-0.61
Wonder
-0.61
itud
-0.60
MacBook
-0.60
Spit
-0.59
POSITIVE LOGITS
eters
2.79
etts
2.14
ervatives
2.08
ervative
1.95
Emanuel
1.26
easing
1.11
artisan
1.04
eting
0.99
uchin
0.98
ets
0.97
Activations Density 0.025%