INDEX
Explanations
discussions of bias and opinion in political contexts
New Auto-Interp
Negative Logits
здра
-0.40
슷
-0.38
InitStruct
-0.38
stanze
-0.38
compét
-0.37
engertian
-0.36
PullParser
-0.36
Rozm
-0.36
handleDelete
-0.35
tá
-0.35
POSITIVE LOGITS
biased
0.60
bias
0.56
prejudiced
0.55
bias
0.54
biased
0.54
новништво
0.52
Bias
0.52
neutrality
0.50
biases
0.50
biases
0.50
Activations Density 1.111%