INDEX
Explanations
mentions of past events or historical contexts
references to historical context and trends
New Auto-Interp
Negative Logits
ity
-0.75
plan
-0.65
Stuff
-0.62
Sahara
-0.62
EVA
-0.62
Eva
-0.61
ysis
-0.61
uli
-0.60
amins
-0.59
ation
-0.59
POSITIVE LOGITS
conduc
0.85
orically
0.78
oppressed
0.78
orical
0.77
disadvantaged
0.77
incapable
0.76
fraught
0.76
metic
0.73
unreliable
0.72
ctr
0.72
Activations Density 0.074%