INDEX
    Explanations

    occurrences of adjectives and adverbs that describe characteristics or actions

    New Auto-Interp
    Negative Logits
     Bach
    -0.15
    ebin
    -0.14
    ussed
    -0.14
    esModule
    -0.14
    ewolf
    -0.14
    idon
    -0.14
    idel
    -0.14
    imar
    -0.14
     Plum
    -0.13
    lish
    -0.13
    POSITIVE LOGITS
    rze
    0.15
    izi
    0.15
    rž
    0.14
    Synopsis
    0.14
    isz
    0.14
    ady
    0.13
    OLA
    0.13
    ë¦
    0.13
     boa
    0.13
     Gör
    0.13
    Act Density 0.178%

    No Known Activations