INDEX
Explanations
geopolitical terms and entities, especially related to countries and political figures
references to North Korea and its allies
New Auto-Interp
Negative Logits
%.
-0.78
.",
-0.76
"!
-0.75
.:
-0.75
.<
-0.74
!".
-0.73
.(
-0.71
."
-0.71
."[
-0.69
".
-0.69
POSITIVE LOGITS
*)
1.10
)]
1.06
?)
0.98
)}
0.95
)]
0.92
?)
0.91
)\
0.87
})
0.86
)|
0.83
-)
0.83
Activations Density 2.100%