INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    UnsafeEnabled
    -0.78
     ujednoznacz
    -0.71
    VersionUID
    -0.70
     Numerade
    -0.65
     disambiguazione
    -0.64
    Personendaten
    -0.62
    KommentareTeilen
    -0.61
    rungsseite
    -0.60
     հղումներ
    -0.59
    astéroïdes
    -0.59
    POSITIVE LOGITS
    Thank
    1.27
     Thank
    0.99
     thank
    0.98
    thank
    0.88
    THANK
    0.73
     THANK
    0.71
    <bos>
    0.68
    Vielen
    0.59
    Thanks
    0.58
     thanked
    0.56
    Act Density 0.007%

    No Known Activations