INDEX
Explanations
phrases indicating clarification or elaboration on previous statements
"In other words" or similar rephrasing
in other words
New Auto-Interp
Negative Logits
myſelf
-0.96
houſe
-0.95
Houſe
-0.95
itſelf
-0.95
Efq
-0.93
pleaſure
-0.91
―――――
-0.90
Majefty
-0.89
་་
-0.87
ſelves
-0.86
POSITIVE LOGITS
they
1.03
it
0.95
the
0.94
:
0.91
we
0.86
,
0.86
a
0.85
“
0.78
"
0.77
how
0.75
Activations Density 0.228%