INDEX
Explanations
first-person self-referential statements (the author saying what they did, need, or want).
New Auto-Interp
Negative Logits
594
-0.07
massage
-0.06
45
-0.06
024
-0.06
разм
-0.06
969
-0.06
table
-0.06
زاده
-0.06
施
-0.06
Mohammed
-0.06
POSITIVE LOGITS
I
0.07
!I
0.07
|[
0.06
-library
0.06
gql
0.06
итуа
0.06
I
0.06
/i
0.06
。我
0.06
якому
0.06
Activations Density 0.044%