INDEX
Explanations
phrases related to potential risks or consequences
themes related to actions, political maneuvers, and potential consequences or risks
New Auto-Interp
Negative Logits
ITNESS
-0.76
Joined
-0.65
Annotations
-0.65
RIP
-0.61
andom
-0.59
DX
-0.58
Reader
-0.57
anted
-0.57
Registered
-0.56
Rated
-0.56
POSITIVE LOGITS
would
1.52
would
1.41
wouldn
1.36
could
1.26
could
1.15
undermines
1.15
poses
1.11
might
1.10
entails
1.09
will
1.08
Activations Density 0.333%