INDEX
Explanations
statements of personal experience and introspection
New Auto-Interp
Negative Logits
Theſe
-1.01
שוליים
-1.00
חיצוניים
-0.97
ništvo
-0.96
виправивши
-0.94
يتيمه
-0.90
themſelves
-0.90
ftagPool
-0.88
ſelf
-0.88
principalColumn
-0.88
POSITIVE LOGITS
my
1.01
I
1.01
myself
0.79
my
0.72
私は
0.60
私の
0.60
me
0.60
I
0.58
My
0.57
ฉัน
0.56
Activations Density 0.844%