INDEX
Explanations
statements of distrust or negativity towards authority figures
New Auto-Interp
Negative Logits
resourceCulture
-0.96
jsPsych
-0.78
AccessorTable
-0.76
djangoproject
-0.74
adaptiveStyles
-0.72
fjspx
-0.69
ۣ
-0.68
referrerpolicy
-0.65
Autoritní
-0.64
feroit
-0.64
POSITIVE LOGITS
GEBURTSDATUM
0.72
also
0.64
like
0.60
especially
0.56
wanted
0.54
też
0.53
agrave
0.52
नहीं
0.52
maybe
0.51
uccio
0.49
Activations Density 0.384%