INDEX
    Explanations

    phrases indicating ongoing or repeated actions

    New Auto-Interp
    Negative Logits
    assi
    -0.17
    tn
    -0.16
    terminal
    -0.14
    orb
    -0.14
    oreach
    -0.13
    ryn
    -0.13
    top
    -0.13
    traction
    -0.13
    ely
    -0.13
    roph
    -0.13
    POSITIVE LOGITS
    lename
    0.15
    aight
    0.15
    stylesheet
    0.14
     U
    0.14
    ugo
    0.14
     Sund
    0.14
     Wilkinson
    0.14
    Conv
    0.14
    CLA
    0.14
    49
    0.13
    Act Density 0.197%

    No Known Activations