INDEX
    Explanations

    texts written in a specific font style

    instances of numbered bullet points or rankings

    New Auto-Interp
    Negative Logits
     withdrawal
    -0.67
     attent
    -0.66
     expenditure
    -0.62
     consideration
    -0.61
     conversion
    -0.61
     opt
    -0.60
     elimination
    -0.60
     glare
    -0.60
    hement
    -0.60
     deprivation
    -0.60
    POSITIVE LOGITS
    since
    0.87
    Anonymous
    0.86
    advertisement
    0.85
    THIS
    0.82
    Black
    0.79
    ãĥ´
    0.78
    eq
    0.76
    BU
    0.76
    yet
    0.76
    jer
    0.76
    Act Density 0.217%

    No Known Activations