INDEX
    Explanations

    phrases indicating certainty or strong predictions about future actions or events

    New Auto-Interp
    Negative Logits
    ViewFeatures
    -0.78
     AssemblyCulture
    -0.69
     Verdi
    -0.64
    entyfik
    -0.64
     Ratna
    -0.64
     Cowell
    -0.62
     serine
    -0.60
     Micron
    -0.60
    ModelState
    -0.59
    CORBA
    -0.59
    POSITIVE LOGITS
     only
    1.22
     Only
    1.01
     лишь
    1.00
    Only
    0.95
     ONLY
    0.95
    ONLY
    0.92
    only
    0.88
    Sólo
    0.85
    Chỉ
    0.83
     onely
    0.81
    Act Density 0.121%

    No Known Activations