INDEX
Explanations
phrases indicating a sense of inclusivity or universality
occurrences of the word "all"
New Auto-Interp
Negative Logits
IDS
-0.61
aminer
-0.60
bal
-0.58
dt
-0.58
Caption
-0.56
inth
-0.56
oute
-0.56
FG
-0.55
hift
-0.55
ahime
-0.54
POSITIVE LOGITS
ocating
1.18
igator
1.15
uding
1.11
usion
1.04
igators
1.03
usions
0.98
ocated
0.97
udes
0.96
uring
0.93
ocation
0.93
Activations Density 0.074%