INDEX
Explanations
references to protests and demonstrations related to social issues
New Auto-Interp
Negative Logits
crow
-0.29
crowd
-0.28
crowds
-0.26
Crow
-0.25
Crowd
-0.24
Crow
-0.23
Crown
-0.19
crow
-0.19
mult
-0.18
crown
-0.18
POSITIVE LOGITS
outside
0.86
outside
0.75
Outside
0.71
Outside
0.66
å¤ĸ
0.44
outsider
0.41
exterior
0.40
buiten
0.39
å¤ĸ
0.37
outsiders
0.36
Activations Density 0.213%