INDEX
Explanations
phrases related to power dynamics and moral responsibility
references or mentions of the concept of freedom
New Auto-Interp
Negative Logits
iPod
-0.65
Henri
-0.61
lay
-0.61
Continental
-0.61
Dele
-0.60
-0.60
Crom
-0.57
Proud
-0.57
mandate
-0.57
Carrier
-0.57
POSITIVE LOGITS
Ŀ
4.29
Ł
1.81
ľ
1.80
¡
1.76
ļ
1.73
ª
1.65
©
1.65
ĺ
1.65
Ĺ
1.61
ŀ
1.61
Activations Density 0.267%