INDEX
    Explanations

    the word "simple" and its variations, indicating a focus on simplicity or straightforwardness in the text

    New Auto-Interp
    Negative Logits
     Inscrivez
    -0.54
    Karriere
    -0.50
    HostException
    -0.49
     tenure
    -0.47
     Consultez
    -0.46
    Välislingid
    -0.46
    InStock
    -0.46
    eryllium
    -0.45
     Tenure
    -0.45
    ronyms
    -0.45
    POSITIVE LOGITS
     simple
    1.13
     SIMPLE
    1.09
     Simple
    1.08
    Simple
    1.08
    simple
    1.05
    SIMPLE
    0.98
     simples
    0.96
     semplici
    0.95
     simpl
    0.92
     einfachen
    0.90
    Act Density 0.038%

    No Known Activations