INDEX
    Explanations

    references to violent actions or crimes involving physical harm to individuals

    New Auto-Interp
    Negative Logits
    Sqft
    -0.70
     meras
    -0.69
     kemer
    -0.66
    AnchorStyles
    -0.64
    bitat
    -0.62
     ekos
    -0.61
    postIndex
    -0.60
     plis
    -0.59
     quí
    -0.59
    \{\\
    -0.58
    POSITIVE LOGITS
     carrefour
    0.83
     désert
    0.83
     suivie
    0.81
     jurassic
    0.78
     joyeux
    0.77
     fameux
    0.77
     mystère
    0.75
    Yeet
    0.75
     Wtf
    0.74
     triomphe
    0.74
    Act Density 0.183%

    No Known Activations