INDEX
    Explanations

    adjectives and quantifiers that express degree or quantity

    New Auto-Interp
    Negative Logits
    raig
    -0.17
    etwork
    -0.16
    DO
    -0.15
    ahrain
    -0.15
    noop
    -0.14
    vil
    -0.14
     Rever
    -0.14
    nette
    -0.14
    juan
    -0.13
    lobs
    -0.13
    POSITIVE LOGITS
     Maur
    0.17
    urdy
    0.16
    (Attribute
    0.14
    uros
    0.14
    JT
    0.14
    kla
    0.14
    dden
    0.13
    živ
    0.13
    addin
    0.13
    bury
    0.13
    Act Density 1.118%

    No Known Activations