INDEX
Explanations
references to power dynamics and political negotiations
preceding modal verbs
modal verbs followed by verbs
New Auto-Interp
Negative Logits
showcasing
-0.94
showcased
-0.88
prioritize
-0.88
impactful
-0.87
incentiv
-0.85
onboarding
-0.83
transitioning
-0.82
prioritizing
-0.82
referencing
-0.81
leveraging
-0.79
POSITIVE LOGITS
daß
0.84
skall
0.77
muß
0.76
läßt
0.67
faßt
0.65
Himo
0.64
mußte
0.62
daardoor
0.60
rospy
0.59
doubtless
0.58
Activations Density 0.695%