INDEX
Explanations
phrases related to being in charge or controlling something
references to groups, organizations, or entities that exert control or influence
New Auto-Interp
Negative Logits
ayers
-0.83
peak
-0.73
ema
-0.73
Chomsky
-0.72
oult
-0.71
orrow
-0.70
hesda
-0.70
lime
-0.69
iatus
-0.69
eeks
-0.65
POSITIVE LOGITS
smoothly
0.99
simulations
0.89
gam
0.87
err
0.86
affairs
0.86
auntlet
0.86
risk
0.84
efficiently
0.77
tests
0.76
ways
0.76
Activations Density 0.142%