INDEX
Explanations
instances of the first-person pronoun "I" (self-references by the assistant).
New Auto-Interp
Negative Logits
cyclists
-0.07
_co
-0.07
_col
-0.07
-ob
-0.06
_input
-0.06
695
-0.06
(callback
-0.06
increment
-0.06
unto
-0.06
.runners
-0.06
POSITIVE LOGITS
Tue
0.06
ekkür
0.06
milfs
0.06
STACK
0.06
Tại
0.06
saturated
0.06
KeyType
0.06
Infinity
0.06
bolster
0.06
suk
0.05
Activations Density 0.108%