INDEX
    Explanations

    words associated with behavioral concepts

    New Auto-Interp
    Negative Logits
    eon
    -0.20
    asz
    -0.18
    893
    -0.16
    tiles
    -0.16
    ean
    -0.15
    enaire
    -0.15
    uguay
    -0.15
    erate
    -0.15
    ties
    -0.15
    ecko
    -0.15
    POSITIVE LOGITS
    emoth
    0.44
    older
    0.31
    aviors
    0.31
    old
    0.29
    emo
    0.28
    aviour
    0.27
    aviours
    0.27
    olders
    0.26
    emouth
    0.26
    olding
    0.26
    Act Density 0.010%

    No Known Activations