INDEX
    Explanations

    references to standards or criteria in various contexts

    New Auto-Interp
    Negative Logits
    age
    -0.18
    kind
    -0.17
    strar
    -0.17
    NEL
    -0.16
    mbH
    -0.16
    uggy
    -0.16
    istry
    -0.15
    ording
    -0.15
    ornings
    -0.15
     staring
    -0.14
    POSITIVE LOGITS
    -setting
    0.22
     setters
    0.18
    heets
    0.17
     setter
    0.17
    llib
    0.16
     gap
    0.16
    /go
    0.15
    754
    0.15
    arias
    0.15
     impro
    0.15
    Act Density 0.018%

    No Known Activations