INDEX
Explanations
phrases indicating impending danger or urgency
specific phrases and constructs related to existential concepts and ideas
New Auto-Interp
Negative Logits
omorphic
-0.86
ģĸ
-0.69
1001
-0.63
onite
-0.61
odic
-0.61
brid
-0.60
Pub
-0.59
_>
-0.59
bda
-0.59
ovember
-0.59
POSITIVE LOGITS
they
1.16
he
1.15
she
1.11
we
1.03
she
1.03
they
0.95
you
0.94
He
0.85
SHE
0.84
I
0.84
Activations Density 0.270%