INDEX
    Explanations

    specific mentions or references in text

    occurrences of the word "mentions."

    New Auto-Interp
    Negative Logits
    sett
    -0.85
    orneys
    -0.78
    arine
    -0.70
    ridge
    -0.70
     padd
    -0.68
    squ
    -0.68
    iership
    -0.68
    otypes
    -0.66
    psons
    -0.65
    ascript
    -0.64
    POSITIVE LOGITS
     mentions
    1.17
     mentioning
    1.07
     mention
    0.92
    ãĤ¼ãĤ¦ãĤ¹
    0.84
    ij士
    0.78
    marks
    0.75
     Vegeta
    0.73
    ãĤ®
    0.72
     è£ı
    0.72
    places
    0.72
    Act Density 0.007%

    No Known Activations