INDEX
    Explanations

    function words in explanation texts

    New Auto-Interp
    Negative Logits
     lying
    -0.06
     tn
    -0.06
     نمایش
    -0.06
    _CONNECTED
    -0.06
    Sketch
    -0.06
     chairs
    -0.06
     Boots
    -0.06
    <?↵
    -0.06
     Hera
    -0.06
    907
    -0.06
    POSITIVE LOGITS
    :SetPoint
    0.07
    06
    0.07
    /A
    0.06
    ","
    0.06
    lsru
    0.06
    raci
    0.06
    minor
    0.06
    (:
    0.06
     Navbar
    0.05
     Kaynak
    0.05
    Act Density 0.089%

    No Known Activations