INDEX
    Explanations

    adjectives that describe qualities and characteristics

    New Auto-Interp
    Negative Logits
    rike
    -0.16
    eya
    -0.16
    onto
    -0.16
    ÙĴس
    -0.15
    rikes
    -0.15
    illage
    -0.15
    illow
    -0.15
    lite
    -0.15
    bes
    -0.15
    eso
    -0.14
    POSITIVE LOGITS
     enough
    0.19
    DITION
    0.17
    ness
    0.16
    ly
    0.16
    izza
    0.15
    ibar
    0.14
    ÙĤدر
    0.14
    ä¸Ķ
    0.14
    raig
    0.14
    tvrt
    0.14
    Act Density 0.302%

    No Known Activations