INDEX
Explanations
mentions of specific nationalities or professions
entities related to individuals involved in various social, political, and legal contexts
New Auto-Interp
Negative Logits
ptions
-0.76
ologies
-0.70
mins
-0.66
akings
-0.65
winners
-0.65
lows
-0.64
periods
-0.64
ippers
-0.64
Pieces
-0.64
Racial
-0.62
POSITIVE LOGITS
alyst
0.91
named
0.91
who
0.89
staffer
0.79
acquaintance
0.76
whose
0.76
colleague
0.73
worker
0.73
friend
0.72
looking
0.70
Activations Density 0.402%