INDEX
    Explanations

    phrases that express simplicity, familiarity, or common experiences

    New Auto-Interp
    Negative Logits
    tagHelper
    -0.68
     zwar
    -0.65
    httphttps
    -0.63
    ailleurs
    -0.60
     also
    -0.60
    igens
    -0.59
    Nonnull
    -0.58
    certainly
    -0.58
     également
    -0.58
    esm
    -0.58
    POSITIVE LOGITS
     Simplemente
    0.84
     simplesmente
    0.83
     simply
    0.76
     einfach
    0.76
    Просто
    0.76
     Просто
    0.76
     prostu
    0.75
     simplemente
    0.74
    Simply
    0.71
     zwy
    0.69
    Act Density 0.244%

    No Known Activations