INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    control
    -0.07
    setWidth
    -0.07
    (Language
    -0.07
    {}".
    -0.06
     thrown
    -0.06
    -light
    -0.06
    ibur
    -0.06
     cosine
    -0.06
     Royal
    -0.06
    monkey
    -0.06
    POSITIVE LOGITS
     gap
    0.13
     Gap
    0.13
     gaps
    0.10
    Gap
    0.10
    _GAP
    0.09
     GAP
    0.08
    -gap
    0.07
    -opening
    0.07
    rig
    0.07
    _FS
    0.06
    Act Density 0.008%

    No Known Activations