INDEX
    Explanations

    comparisons in the context of improvement or deterioration

    evaluative language related to improvement and decline

    New Auto-Interp
    Negative Logits
     entirety
    -0.77
    heid
    -0.70
    iao
    -0.65
     hemisphere
    -0.64
     Lau
    -0.63
     halves
    -0.60
    apple
    -0.60
     holder
    -0.59
    ellow
    -0.59
     Hong
    -0.58
    POSITIVE LOGITS
     sidx
    0.86
     mileage
    0.83
     traction
    0.83
    ãĤ¼
    0.82
    noticed
    0.80
    ModLoader
    0.79
     veter
    0.77
    retty
    0.74
    wcs
    0.72
     puberty
    0.71
    Act Density 0.088%

    No Known Activations