INDEX
    Explanations

    mentions of statements or positions made by political figures

    New Auto-Interp
    Negative Logits
    heter
    -0.99
    assis
    -0.99
     ILCS
    -0.94
     conditioning
    -0.89
    phrine
    -0.86
     Adin
    -0.84
    76561
    -0.83
    Fram
    -0.81
    WARE
    -0.81
    ASE
    -0.77
    POSITIVE LOGITS
     sure
    1.33
    hift
    1.15
     headlines
    1.11
    ailable
    1.11
     strides
    1.08
    ÄŁ
    1.08
    netflix
    1.06
    undo
    1.04
    awed
    1.03
    itions
    1.03
    Act Density 1.595%

    No Known Activations