INDEX
Explanations
statements that express opinions or beliefs
instances of the article "a" followed by various nouns
New Auto-Interp
Negative Logits
Edit
-0.83
ioch
-0.78
mares
-0.71
items
-0.69
images
-0.68
aday
-0.68
Ub
-0.65
adding
-0.64
Age
-0.64
belts
-0.64
POSITIVE LOGITS
hoax
1.08
legitimate
1.05
mistake
1.04
coincidence
1.02
ploy
0.95
nuisance
0.93
versive
0.93
viable
0.89
distraction
0.89
genuine
0.89
Activations Density 0.274%