INDEX
Explanations
Descriptions related to various events or situations happening
themes related to deception and manipulation in narrative contexts
New Auto-Interp
Negative Logits
´
-0.66
He
-0.65
isoft
-0.64
buquerque
-0.64
âľ
-0.63
bool
-0.63
HCR
-0.61
His
-0.60
iership
-0.60
VPN
-0.60
POSITIVE LOGITS
themselves
1.30
their
1.24
theirs
1.03
their
1.01
THEIR
0.90
they
0.85
Their
0.84
Their
0.77
selves
0.76
apiece
0.69
Activations Density 0.900%