INDEX
Explanations
occurrences of the pronoun "I"
New Auto-Interp
Negative Logits
Variables
-0.13
suming
-0.13
">//
-0.13
義
-0.13
apple
-0.13
éĶĭ
-0.12
гал
-0.12
بÙĪÙĦ
-0.12
oÄŁ
-0.12
aby
-0.12
POSITIVE LOGITS
Wanna
0.28
Ain
0.25
Saw
0.25
Heard
0.24
Got
0.24
Found
0.23
Wish
0.23
WAN
0.22
Won
0.22
Hate
0.22
Activations Density 0.052%