INDEX
Explanations
mentions of specific dates along with related actions or events
mentions of violence or threats associated with specific events or individuals
New Auto-Interp
Negative Logits
.''
-0.72
''.
-0.64
.�
-0.63
.""
-0.62
."[
-0.62
]."
-0.59
.","
-0.58
".[
-0.58
.''.
-0.57
âĢij
-0.56
POSITIVE LOGITS
lez
0.54
Canaver
0.51
Blizz
0.50
Femin
0.50
Patreon
0.48
Dunham
0.48
adorable
0.48
Blizzard
0.48
Zoro
0.47
Pokemon
0.47
Activations Density 3.290%