INDEX
Explanations
occurrences of the word "all."
references to inclusivity or universality in statements
New Auto-Interp
Negative Logits
ubi
-0.67
ASAP
-0.61
mire
-0.59
Loading
-0.58
aria
-0.58
Hold
-0.58
poke
-0.57
andra
-0.57
raped
-0.57
well
-0.56
POSITIVE LOGITS
imaginable
0.85
士
0.75
tabl
0.73
conceivable
0.71
hyde
0.69
Downloadha
0.64
intestinal
0.64
aspirin
0.63
Occupations
0.63
chens
0.62
Activations Density 0.059%