INDEX
Explanations
instances of people being criticized or attacked for various reasons
the word "for" in various contexts, indicating a focus on prepositional phrases
New Auto-Interp
Negative Logits
illin
-0.82
edin
-0.72
atl
-0.71
Mine
-0.67
awan
-0.66
®
-0.64
NET
-0.64
abo
-0.62
mare
-0.61
nan
-0.61
POSITIVE LOGITS
geries
1.10
daring
1.03
violating
1.02
failing
1.00
lack
0.95
reasons
0.94
refusing
0.93
questioning
0.89
gery
0.88
breaching
0.86
Activations Density 0.144%