INDEX
Explanations
statements about representation and social issues in media
New Auto-Interp
Negative Logits
muß
-1.18
läßt
-1.09
daß
-1.08
müßte
-0.99
Надо
-0.99
Moslem
-0.93
^(@)
-0.92
idéia
-0.88
Надо
-0.88
Daß
-0.87
POSITIVE LOGITS
Additionally
0.81
Alright
0.77
Additionally
0.72
incentiv
0.71
prioritizing
0.70
prioritize
0.70
impactful
0.69
aforementioned
0.68
createNewFile
0.67
newfound
0.66
Activations Density 0.549%