INDEX
Explanations
phrases related to human rights violations and political events
phrases related to social issues and minority groups
New Auto-Interp
Negative Logits
*)
-0.70
!)
-0.62
hindsight
-0.57
)!
-0.56
autions
-0.56
VIDEOS
-0.56
)</
-0.54
-)
-0.52
?)
-0.51
broch
-0.50
POSITIVE LOGITS
ãĢĤ
0.62
.",
0.61
èĢ
0.61
atever
0.59
".
0.58
TAMADRA
0.57
".
0.55
hene
0.55
respectively
0.54
',"
0.54
Activations Density 1.771%