INDEX
    Explanations

    occurrences of class-related terms and numerical references

    New Auto-Interp
    Negative Logits
     Ch
    -0.20
     ch
    -0.18
    Ch
    -0.18
    ODB
    -0.16
    763
    -0.15
     Tavern
    -0.15
    ience
    -0.15
    .ch
    -0.14
    DEX
    -0.14
     Seven
    -0.14
    POSITIVE LOGITS
    ere
    0.18
    erek
    0.17
    ÏĮμε
    0.15
    ãĤ¹ãĥŀ
    0.15
    à¤
    0.15
    ergy
    0.14
     masc
    0.14
    ãĥ¬
    0.14
    im
    0.14
    ering
    0.14
    Act Density 0.040%

    No Known Activations