INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    lek
    -0.68
     Te
    -0.67
     alarm
    -0.67
    5
    -0.67
    le
    -0.66
    Le
    -0.63
    lege
    -0.62
    isk
    -0.61
    Ges
    -0.61
     Leg
    -0.61
    POSITIVE LOGITS
     />
    1.90
    />
    1.71
    "/>
    1.61
     />\
    1.56
     />
    
    1.56
    }}/>
    1.44
    }/>
    1.37
     />';
    1.28
    />
    
    1.28
     />";
    1.27
    Act Density 0.035%

    No Known Activations