INDEX
    Explanations

    phrases indicating consistency and reliability in behavior or actions

    New Auto-Interp
    Negative Logits
     often
    -0.20
     ikke
    -0.18
     tidak
    -0.18
     Often
    -0.18
     altogether
    -0.17
     không
    -0.17
     не
    -0.17
    oder
    -0.17
     souvent
    -0.17
     artık
    -0.17
    POSITIVE LOGITS
     been
    0.25
    cky
    0.20
    greens
    0.20
     seemed
    0.19
     seems
    0.19
     seem
    0.19
    ready
    0.18
    green
    0.18
     gonna
    0.18
    -on
    0.17
    Act Density 0.067%

    No Known Activations