INDEX
    Explanations

    phrases related to authority figures making statements or giving instructions

    New Auto-Interp
    Negative Logits
     cahier
    -0.86
     cannes
    -0.85
     peculi
    -0.84
     emphat
    -0.81
    Huhu
    -0.80
     agi
    -0.80
     fte
    -0.80
     sembl
    -0.79
     velours
    -0.78
     bourg
    -0.77
    POSITIVE LOGITS
     told
    0.78
     tell
    0.75
     how
    0.71
     about
    0.70
     tells
    0.68
     telling
    0.63
    tell
    0.63
     what
    0.62
     Told
    0.61
    <bos>
    0.61
    Act Density 0.140%

    No Known Activations