INDEX
    Explanations

    occurrences of the word "for"

    New Auto-Interp
    Negative Logits
     itself
    -0.52
     then
    -0.38
    今度は
    -0.37
     méri
    -0.37
     themselves
    -0.35
     herself
    -0.35
    <_>
    -0.35
     âgé
    -0.34
    (=)
    -0.33
     médié
    -0.33
    POSITIVE LOGITS
    ########.
    0.78
    WriteTagHelper
    0.68
    ographics
    0.64
    ReusableCell
    0.62
     verſch
    0.62
     teflon
    0.61
     وتسجيلات
    0.61
    AddTagHelper
    0.60
     Signalez
    0.60
    assium
    0.60
    Act Density 0.005%

    No Known Activations