INDEX
    Explanations

    specific numbers or numerical information in the text

    parentheses and similar punctuation

    New Auto-Interp
    Negative Logits
     hid
    -0.85
     resur
    -0.80
     habit
    -0.74
     haunted
    -0.74
     tongue
    -0.73
     hiding
    -0.73
     personality
    -0.72
     bro
    -0.72
     breed
    -0.71
     firsthand
    -0.70
    POSITIVE LOGITS
    excluding
    1.89
    including
    1.68
    average
    1.66
    minimum
    1.64
    approximately
    1.62
    depending
    1.58
    meaning
    1.52
    adjusted
    1.50
    total
    1.49
    according
    1.48
    Act Density 0.126%

    No Known Activations