INDEX
Explanations
specific activities or events that took place
references to social media activities and incidents involving public figures
New Auto-Interp
Negative Logits
"))
-0.63
)).
-0.60
])
-0.59
})
-0.59
]).
-0.57
});
-0.56
dele
-0.55
akers
-0.53
ĪĴ
-0.52
ao
-0.52
POSITIVE LOGITS
shortly
0.91
insofar
0.87
via
0.86
because
0.86
sometime
0.84
alongside
0.81
along
0.75
primarily
0.75
pursuant
0.75
although
0.74
Activations Density 1.112%