INDEX
Explanations
expressions of personal feelings or experiences
New Auto-Interp
Negative Logits
Efq
-1.75
Monfieur
-1.70
Reſ
-1.66
houſe
-1.64
Theſe
-1.63
Houſe
-1.60
Eſ
-1.59
itſelf
-1.56
―――――
-1.55
Anſ
-1.55
POSITIVE LOGITS
I
3.12
I
1.79
we
1.79
i
1.61
We
1.47
he
1.45
my
1.34
я
1.26
He
1.23
It
1.21
Activations Density 0.265%