INDEX
    Explanations

    words associated with happiness or expressions of gladness

    New Auto-Interp
    Negative Logits
    erli
    -0.17
    er
    -0.15
    i
    -0.15
    frau
    -0.15
    etics
    -0.15
    ersed
    -0.14
    lectual
    -0.14
    ersh
    -0.14
    eri
    -0.14
    werk
    -0.14
    POSITIVE LOGITS
    ys
    0.24
    stone
    0.22
    ness
    0.20
    win
    0.19
     tid
    0.18
    ewater
    0.18
    wyn
    0.17
    tid
    0.17
    dest
    0.17
    STONE
    0.17
    Act Density 0.005%

    No Known Activations