INDEX
    Explanations

    words related to tendencies or inclinations

    repeated phrases indicating tendencies or patterns in behavior

    New Auto-Interp
    Negative Logits
    gur
    -0.73
    arta
    -0.72
    lain
    -0.67
    yz
    -0.65
     Bunker
    -0.60
    fil
    -0.60
     Agenda
    -0.59
    ft
    -0.58
    ZA
    -0.56
    zbek
    -0.56
    POSITIVE LOGITS
    rils
    1.29
    entious
    1.12
    ril
    1.00
    erers
    0.89
    entimes
    0.89
    erer
    0.87
    erest
    0.87
    uce
    0.82
    ensical
    0.81
    eman
    0.77
    Act Density 0.014%

    No Known Activations