INDEX
    Explanations

    phrases expressing ambition and dedication

    New Auto-Interp
    Negative Logits
    weg
    -0.16
    most
    -0.15
    /by
    -0.15
    quier
    -0.15
    /up
    -0.15
    atee
    -0.14
    /from
    -0.14
    xit
    -0.14
    ritten
    -0.14
    like
    -0.14
    POSITIVE LOGITS
     harder
    0.26
     towards
    0.25
     toward
    0.24
     hardest
    0.23
    -hard
    0.22
     hard
    0.21
    Towards
    0.19
    hard
    0.19
     Towards
    0.18
    ToFit
    0.18
    Act Density 0.012%

    No Known Activations