INDEX
    Explanations

    phrases related to societal issues and global politics

    the presence of specific symbols or characters in the text

    New Auto-Interp
    Negative Logits
     buggy
    -0.74
     decomp
    -0.71
     scattering
    -0.71
     scatter
    -0.70
     smokes
    -0.69
    anwhile
    -0.69
    glers
    -0.68
     rooting
    -0.68
     lodging
    -0.68
     dumping
    -0.67
    POSITIVE LOGITS
    £
    1.09
    º
    0.97
    ¹
    0.94
    âĹ
    0.89
    Serv
    0.88
    »
    0.86
    ®
    0.86
    ¡
    0.86
    Hon
    0.86
     âĢº
    0.83
    Act Density 0.266%

    No Known Activations