INDEX
Explanations
the presence of the word "I" as an indicator of personal perspective or statements
New Auto-Interp
Negative Logits
intptr
-0.89
-0.78
ReusableCell
-0.76
CEPTION
-0.72
يتيمه
-0.71
Искәрмәләр
-0.70
UTTON
-0.68
Ques
-0.65
homen
-0.64
BRARY
-0.63
POSITIVE LOGITS
I
2.71
I
2.08
My
1.26
We
1.25
Tôi
1.20
我
1.10
ฉัน
1.07
My
1.05
tôi
1.04
We
1.00
Activations Density 0.078%