INDEX
    Explanations

    nouns and related terms indicating identity and categorization

    New Auto-Interp
    Negative Logits
    uco
    -0.16
    ationToken
    -0.16
    din
    -0.15
    erli
    -0.15
    Normalization
    -0.14
    ži
    -0.14
    isher
    -0.14
    ingham
    -0.14
    perature
    -0.14
    acci
    -0.14
    POSITIVE LOGITS
    type
    0.15
     quadr
    0.15
    affer
    0.15
    ovenant
    0.15
    CLS
    0.14
    vg
    0.14
    Dash
    0.14
    ãĤ·ãĥ£
    0.14
    iy
    0.14
    bat
    0.14
    Act Density 0.049%

    No Known Activations