INDEX
    Explanations

    text related to linguistic theory and grammatical principles

    New Auto-Interp
    Negative Logits
    <bos>
    -1.07
    -0.92
    iddhar
    -0.81
     Mlle
    -0.78
     quitted
    -0.77
    /**
    -0.75
     hentai
    -0.75
     disambigu
    -0.73
     shenan
    -0.71
     gild
    -0.67
    POSITIVE LOGITS
     parameter
    1.22
    parameter
    1.20
    Parameter
    1.16
     Parameter
    1.14
     param
    1.11
     parameters
    1.11
    Param
    1.11
     Param
    1.09
    parameters
    1.08
    PARAM
    1.04
    Act Density 0.411%

    No Known Activations