INDEX
    Explanations

    phrases related to policies or advocacy

    New Auto-Interp
    Negative Logits
    ĸļ
    -1.03
    ãĤ¼ãĤ¦ãĤ¹
    -0.93
     Sins
    -0.76
     Halls
    -0.72
     Gorge
    -0.69
     Twain
    -0.68
     similarities
    -0.64
     gorge
    -0.64
     sinks
    -0.64
    owship
    -0.64
    POSITIVE LOGITS
    digy
    1.52
    verbs
    1.24
    ctor
    1.18
    actively
    1.18
    dding
    1.16
    pelling
    1.16
    strate
    1.15
    ccess
    1.10
    gression
    1.06
    dig
    1.06
    Act Density 0.012%

    No Known Activations