INDEX
    Explanations

    adjectives that express opinions or judgments

    phrases expressing opinions or evaluations

    New Auto-Interp
    Negative Logits
    ãĥīãĥ©
    -0.74
    arthed
    -0.68
    andise
    -0.67
     parcels
    -0.65
    ãĤº
    -0.64
    è¦ļéĨĴ
    -0.63
     gust
    -0.62
    dden
    -0.62
    translation
    -0.61
    éĹĺ
    -0.60
    POSITIVE LOGITS
     underest
    0.76
     underestimate
    0.75
     kidding
    0.69
     overest
    0.66
     underestimated
    0.66
     smarter
    0.64
     worthwhile
    0.63
     miscon
    0.63
    ono
    0.62
     deserved
    0.61
    Act Density 0.336%

    No Known Activations