INDEX
Explanations
the presence of the word "be" in various contexts
New Auto-Interp
Negative Logits
picada
-0.61
aérienne
-0.60
WriteTagHelper
-0.58
progressively
-0.58
atalytic
-0.58
År
-0.57
ilever
-0.57
XXIII
-0.57
skraft
-0.57
destruct
-0.57
POSITIVE LOGITS
Not
1.02
(!__
0.83
Not
0.82
NOT
0.78
Nicht
0.76
就不是
0.76
IsNot
0.70
necessarily
0.68
Bukan
0.68
unlike
0.68
Activations Density 0.219%