INDEX
    Explanations

    mentions of official titles or positions of authority

    statements made by officials or representatives

    New Auto-Interp
    Negative Logits
    abiding
    -0.54
    tumblr
    -0.51
    successfully
    -0.50
     Âł Âł Âł Âł
    -0.50
    pires
    -0.49
     diaper
    -0.49
     Âł Âł Âł Âł Âł Âł Âł Âł
    -0.48
     miracle
    -0.48
     stret
    -0.47
     Articles
    -0.46
    POSITIVE LOGITS
    ]."
    0.64
    anton
    0.60
     adding
    0.59
    ].
    0.57
    zinski
    0.54
    ¥µ
    0.54
    >.
    0.53
    }.
    0.52
    heny
    0.52
    izer
    0.52
    Act Density 0.660%

    No Known Activations