INDEX
    Explanations

    references to programming languages

    New Auto-Interp
    Negative Logits
     estekak
    -0.84
    "]];
    -0.78
    "]]
    -0.75
    ']]
    -0.74
    ']],
    -0.69
    <>
    
    -0.68
    ]]
    
    -0.68
    "]);
    
    -0.67
    }]
    
    -0.66
    }');
    -0.66
    POSITIVE LOGITS
    lang
    2.56
     lang
    1.63
    Lang
    1.48
     Lang
    1.48
    LANG
    1.33
     LANG
    1.30
    langs
    1.17
    lange
    0.96
    langen
    0.82
     language
    0.78
    Act Density 0.025%

    No Known Activations