INDEX
    Explanations

    words related to falling or decline

    New Auto-Interp
    Negative Logits
    ed
    -0.27
    ores
    -0.24
    of
    -0.23
    o
    -0.23
    off
    -0.23
    ovice
    -0.22
    eer
    -0.22
    ovich
    -0.21
    oi
    -0.21
    oit
    -0.20
    POSITIVE LOGITS
    llll
    0.33
    l
    0.31
    ows
    0.29
    IGENCE
    0.25
    eries
    0.23
    t
    0.23
    ll
    0.23
    usions
    0.22
    mann
    0.22
    ustr
    0.21
    Act Density 0.091%

    No Known Activations