INDEX
Explanations
references to individuals involved in actions related to criminal activity or accusations
news-related sentences involving arrests, charges, or legal actions against individuals
New Auto-Interp
Negative Logits
mathemat
-0.82
neurons
-0.79
incumb
-0.78
equivalents
-0.75
trillions
-0.74
dividends
-0.72
corridors
-0.71
transcripts
-0.70
verbs
-0.70
coffers
-0.70
POSITIVE LOGITS
whose
0.91
whom
0.81
whose
0.74
hunt
0.73
named
0.73
who
0.73
who
0.69
gue
0.68
ivan
0.68
accidentally
0.68
Activations Density 0.520%