INDEX
Explanations
repeated references to the pronoun "it."
New Auto-Interp
Negative Logits
swear
-0.57
aware
-0.55
useDispatch
-0.53
]='\
-0.52
طيع
-0.49
sworn
-0.49
knew
-0.49
Controle
-0.49
_{[-0.49
Orrell
-0.49
POSITIVE LOGITS
is
0.76
consists
0.76
involves
0.74
occurs
0.73
occur
0.71
aarrggbb
0.69
consiste
0.66
comprises
0.66
comes
0.65
consist
0.65
Activations Density 0.279%