INDEX
Explanations
phrases indicating acting in someone's interest or on someone's behalf
phrases that emphasize representation and advocacy on behalf of others
New Auto-Interp
Negative Logits
unct
-0.62
gr
-0.60
olic
-0.59
alach
-0.59
cer
-0.57
fitting
-0.57
pol
-0.57
aucus
-0.57
Topic
-0.56
Okin
-0.56
POSITIVE LOGITS
steps
0.84
selves
0.68
agents
0.68
ivas
0.65
farious
0.64
ombat
0.64
²¾
0.63
stretched
0.63
endeavors
0.63
issance
0.62
Activations Density 0.160%