INDEX
    Explanations

    sentences that end in "to" followed by high activations

    New Auto-Interp
    Negative Logits
    <bos>
    -2.96
    -0.80
    /**
    -0.77
    <?
    -0.76
    
    
    -0.70
    الد
    -0.63
    putnik
    -0.59
    /*
    -0.57
    //
    -0.57
    -0.56
    POSITIVE LOGITS
     lamborghini
    1.54
     scrat
    1.53
     affor
    1.52
     maneu
    1.48
     impra
    1.47
     accla
    1.41
     disreg
    1.39
     panama
    1.38
     isuzu
    1.37
     Minang
    1.36
    Act Density 0.224%

    No Known Activations