INDEX
    Explanations

    phrases describing tendencies or inclinations

    New Auto-Interp
    Negative Logits
    zbek
    -0.68
    yz
    -0.63
    gur
    -0.62
     Polo
    -0.62
    loo
    -0.55
    pelling
    -0.54
    terday
    -0.53
    ZA
    -0.52
     Slate
    -0.52
    fts
    -0.52
    POSITIVE LOGITS
    rils
    1.33
    entious
    1.24
     toward
    1.14
     towards
    1.07
     to
    1.01
    ril
    0.93
    entimes
    0.90
    erest
    0.84
    ered
    0.79
    erers
    0.75
    Act Density 0.035%

    No Known Activations