INDEX
    Explanations

    instances of the word "take" and its various forms

    New Auto-Interp
    Negative Logits
    xes
    -0.15
    ãģĦãģĦ
    -0.15
    anity
    -0.14
    UIT
    -0.14
    گر
    -0.14
    uisse
    -0.14
    under
    -0.14
    rox
    -0.13
    orang
    -0.13
    raith
    -0.13
    POSITIVE LOGITS
    aways
    0.17
     into
    0.14
     advantage
    0.14
    inch
    0.14
    иболее
    0.14
     responsibility
    0.14
    ALI
    0.13
    bart
    0.13
    Flight
    0.13
    /sub
    0.13
    Act Density 0.162%

    No Known Activations