INDEX
    Explanations

    text related to quantifying amounts or levels

    phrases related to categorization or classification

    New Auto-Interp
    Negative Logits
     partName
    -0.61
    arton
    -0.60
    USS
    -0.55
    gov
    -0.53
     Adren
    -0.52
     Journalism
    -0.52
     Stras
    -0.51
     HuffPost
    -0.51
    YN
    -0.50
    atform
    -0.50
    POSITIVE LOGITS
     equivalents
    0.70
    ".[
    0.69
     apiece
    0.69
     destro
    0.68
    .",
    0.66
    .''.
    0.65
     whereas
    0.65
     disadvant
    0.64
    ."[
    0.63
    !".
    0.61
    Act Density 1.970%

    No Known Activations