INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     leben
    -0.06
    flix
    -0.06
    َت
    -0.06
    _ticks
    -0.06
     countries
    -0.06
    .dd
    -0.06
    _OD
    -0.06
    707
    -0.06
     satire
    -0.06
    orks
    -0.06
    POSITIVE LOGITS
     duly
    0.07
     Maths
    0.07
    //------------------------------------------------------------------------------↵
    0.07
    	cr
    0.07
     brave
    0.07
     ----------------------------------------------------------------------------
    0.06
    //---------------------------------------------------------------------------↵↵
    0.06
    <Scalar
    0.06
     "",
    ↵
    0.06
    >R
    0.06
    Act Density 0.002%

    No Known Activations