INDEX
    Explanations

    phrases that indicate challenges or obstacles

    New Auto-Interp
    Negative Logits
    pearance
    -0.15
    strap
    -0.15
    zc
    -0.15
    macen
    -0.14
    akan
    -0.13
    luet
    -0.13
    /Branch
    -0.13
    iaux
    -0.13
    fuck
    -0.13
    foon
    -0.13
    POSITIVE LOGITS
    ly
    0.31
    /im
    0.25
     task
    0.23
    -to
    0.22
    /exp
    0.22
     terrain
    0.21
     khÄĥn
    0.21
    icult
    0.20
     tasks
    0.20
    ies
    0.19
    Act Density 0.053%

    No Known Activations