INDEX
    Explanations

    positive associations with brightness and optimism

    New Auto-Interp
    Negative Logits
    hort
    -0.16
    stÃŃ
    -0.16
    hlen
    -0.15
    ationToken
    -0.14
    rowser
    -0.14
    dds
    -0.14
    hape
    -0.13
    .Frame
    -0.13
    hiro
    -0.13
    Gravity
    -0.13
    POSITIVE LOGITS
    ening
    0.43
    ened
    0.35
    -eyed
    0.35
    en
    0.32
    eners
    0.29
    ens
    0.29
    eyed
    0.28
     eyed
    0.28
    side
    0.28
    ener
    0.27
    Act Density 0.029%

    No Known Activations