INDEX
Explanations
references to authority and critical assessments of certain ideological perspectives
New Auto-Interp
Negative Logits
-0.94
https
-0.84
https
-0.79
incentiv
-0.78
microbiome
-0.75
-0.72
impactful
-0.69
-0.67
curated
-0.66
LGBTQ
-0.65
POSITIVE LOGITS
daß
1.17
mußte
1.08
mußten
1.06
muß
1.01
läßt
1.01
Daß
1.00
müßte
0.98
^(@)
0.96
faßt
0.95
wußte
0.93
Activations Density 5.775%