INDEX
    Explanations

    words related to behavioral tendencies or characteristics

    phrases indicating tendencies or behavioral patterns

    New Auto-Interp
    Negative Logits
    gur
    -0.80
    arta
    -0.79
    lain
    -0.70
    yz
    -0.64
    fil
    -0.64
    zbek
    -0.62
    aban
    -0.61
    ania
    -0.60
    ZA
    -0.59
    oÄŁ
    -0.58
    POSITIVE LOGITS
    rils
    1.36
    entious
    1.03
    ril
    0.96
    erest
    0.85
    entimes
    0.85
    erers
    0.81
    erer
    0.81
    uce
    0.80
     toward
    0.77
    ensical
    0.75
    Act Density 0.015%

    No Known Activations