INDEX
    Explanations

    words related to behavior and actions

    New Auto-Interp
    Negative Logits
     Pr
    -0.15
    quiz
    -0.15
     pr
    -0.15
    gers
    -0.14
     Bowen
    -0.14
    eum
    -0.14
    iones
    -0.13
    igers
    -0.13
     Right
    -0.13
    enade
    -0.13
    POSITIVE LOGITS
    emoth
    0.25
    beh
    0.25
     beh
    0.23
    aviour
    0.23
     Beh
    0.22
    aviors
    0.21
    Beh
    0.20
    avior
    0.19
    avour
    0.18
    aviours
    0.18
    Act Density 0.010%

    No Known Activations