INDEX
    Explanations

    sentences discussing viewpoints or statements attributed to various groups or individuals

    statements or claims attributed to various parties or individuals

    New Auto-Interp
    Negative Logits
    ĸļ士
    -0.80
    cffffcc
    -0.79
     Written
    -0.74
    theless
    -0.70
    ptives
    -0.68
    written
    -0.68
    tele
    -0.67
    dinand
    -0.65
    ãĤ
    -0.65
    productive
    -0.65
    POSITIVE LOGITS
     goodbye
    1.07
    olate
    0.70
     hello
    0.67
     they
    0.67
    IDA
    0.66
     it
    0.66
    olated
    0.66
     NAD
    0.66
    ansky
    0.65
     MSG
    0.64
    Act Density 0.122%

    No Known Activations