INDEX
    Explanations

    name calling or name change

    New Auto-Interp
    Negative Logits
    noticia
    -0.79
     potenciales
    -0.79
     больших
    -0.78
     angefangen
    -0.78
    angaben
    -0.78
    Tarea
    -0.75
     există
    -0.75
    Kenapa
    -0.74
     zuhause
    -0.74
    Kennedy
    -0.71
    POSITIVE LOGITS
     dropping
    1.04
     name
    1.02
    tags
    0.94
     names
    0.92
     puisi
    0.91
    NOPQRST
    0.91
     dropper
    0.90
    tag
    0.90
    plates
    0.88
    npos
    0.88
    Act Density 0.027%

    No Known Activations