INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.58
    Портал
    -0.56
    columnHeader
    -0.54
    Wolverine
    -0.53
    Juri
    -0.53
    Fuzzy
    -0.53
     lèvres
    -0.52
     Dermal
    -0.52
     Juri
    -0.52
    Giới
    -0.52
    POSITIVE LOGITS
     instead
    1.00
    instead
    0.94
    Instead
    0.91
     Instead
    0.88
    vece
    0.74
     betweenstory
    0.60
     вместо
    0.59
     zamiast
    0.54
    stead
    0.53
    mtd
    0.52
    Act Density 0.011%

    No Known Activations