INDEX
Explanations
themes associated with conflict and societal issues
New Auto-Interp
Negative Logits
(“
-0.29
“
-0.24
âĢŀ
-0.20
”
-0.20
‘
-0.19
“â̦
-0.19
..↵↵↵↵
-0.19
=”
-0.17
()].
-0.17
""".
-0.17
POSITIVE LOGITS
"↵
0.37
":
0.36
"|
0.27
"↵↵
0.27
'↵
0.26
)↵
0.25
")↵
0.23
"'↵
0.23
"
0.23
"č↵
0.22
Activations Density 0.169%