INDEX
    Explanations

    references to physical spaces or locations

    New Auto-Interp
    Negative Logits
    <bos>
    -1.71
    -0.74
    <?
    -0.74
    <?
    
    -0.72
    
    
    -0.72
    /*
    -0.70
    /**
    -0.70
    /***
    
    -0.69
     prepare
    -0.68
     continue
    -0.68
    POSITIVE LOGITS
     space
    1.73
     affor
    1.67
     maneu
    1.67
     wien
    1.65
     Space
    1.64
     accla
    1.61
     fta
    1.60
     SPACE
    1.60
     increa
    1.59
     aen
    1.58
    Act Density 0.125%

    No Known Activations