INDEX
Explanations
instances of the word "did" and phrases involving negation
New Auto-Interp
Negative Logits
antMatchers
-0.92
Composable
-0.89
HCR
-0.79
actuels
-0.78
Poss
-0.78
Possession
-0.77
actuelles
-0.75
Sambo
-0.74
viață
-0.74
NOPQRST
-0.73
POSITIVE LOGITS
did
1.11
DID
1.11
Did
0.99
did
0.96
DID
0.92
Didi
0.84
didn
0.83
had
0.82
Did
0.79
었
0.78
Activations Density 0.130%