INDEX
    Explanations

    phrases related to various topics such as community issues, health care reform, and economic policies

    New Auto-Interp
    Negative Logits
     Lauder
    -0.87
    Reviewer
    -0.77
    spin
    -0.70
     smear
    -0.69
    fman
    -0.67
     resorts
    -0.66
     Britons
    -0.65
     shifts
    -0.65
     Authorities
    -0.65
     Numbers
    -0.65
    POSITIVE LOGITS
    atisf
    1.06
    selves
    1.02
     happening
    1.00
    uddenly
    1.00
    ought
    0.99
    ̶
    0.99
    kaya
    0.98
    omething
    0.97
    lightly
    0.97
     happened
    0.94
    Act Density 0.577%

    No Known Activations