INDEX
Explanations
mentions of specific organizations or entities, likely related to news or political contexts
references to a specific organization or framework
New Auto-Interp
Negative Logits
McCartney
-0.65
Gutierrez
-0.64
Blackwell
-0.64
Wiley
-0.63
Brighton
-0.62
felt
-0.60
Cru
-0.60
Hatch
-0.58
Sirius
-0.58
ique
-0.58
POSITIVE LOGITS
DF
1.38
DM
0.98
avorite
0.94
amily
0.93
raid
0.91
sg
0.90
WD
0.89
RF
0.88
yip
0.86
GF
0.84
Activations Density 0.008%