INDEX
Explanations
descriptions of people, particularly focusing on actions they are involved in
references to groups of people or individuals involved in events
New Auto-Interp
Negative Logits
DonaldTrump
-0.70
Bound
-0.68
cially
-0.68
atever
-0.62
Madness
-0.61
Bound
-0.59
Dome
-0.59
Domain
-0.59
igslist
-0.58
this
-0.58
POSITIVE LOGITS
folk
0.85
'
0.75
were
0.73
parted
0.72
']
0.72
wore
0.71
proceeded
0.71
resorted
0.70
complied
0.69
laughed
0.69
Activations Density 0.238%