INDEX
Explanations
elements of political intrigue and relationships to authority or power dynamics
New Auto-Interp
Negative Logits
utafitiHapana
-0.86
)':
-0.85
__':
-0.80
",$
-0.80
PhysRevLett
-0.79
awaiter
-0.79
)":
-0.79
)”.
-0.78
клопе
-0.78
."]
-0.78
POSITIVE LOGITS
and
1.20
or
0.76
and
0.71
,
0.59
that
0.55
(
0.55
with
0.53
AND
0.51
And
0.50
connaissent
0.50
Activations Density 0.312%