INDEX
Explanations
names of individuals involved in various news and political events
references to notable individuals and their actions or claims
New Auto-Interp
Negative Logits
.:
-1.02
.(
-0.91
%.
-0.90
+.
-0.84
.<
-0.82
:(
-0.81
*.
-0.80
!".
-0.80
.
-0.79
.–
-0.79
POSITIVE LOGITS
)]
0.93
)]
0.85
?)
0.83
*)
0.77
?)
0.76
})
0.74
')
0.71
)\
0.70
?),
0.69
)}
0.69
Activations Density 0.991%