INDEX
Explanations
mentions of legal actions or accusations
subjects and their actions in a legal or regulatory context
New Auto-Interp
Negative Logits
raft
-0.80
ournal
-0.79
hetti
-0.73
adium
-0.72
geries
-0.71
reating
-0.71
ictional
-0.71
ancial
-0.71
urgical
-0.69
ourgeois
-0.68
POSITIVE LOGITS
sorely
0.88
dearly
0.85
gladly
0.81
describes
0.79
deems
0.78
presumably
0.78
evidently
0.77
swore
0.77
deem
0.76
dubbed
0.76
Activations Density 0.184%