INDEX
    Explanations

    proper nouns and specific terms related to names and titles

    New Auto-Interp
    Negative Logits
    <bos>
    -2.70
    -1.14
    
    
    -1.12
    <?
    -0.95
    /**
    -0.94
    #
    -0.77
     initComponents
    -0.77
    /*
    -0.71
    intios
    -0.67
    contentLoaded
    -0.67
    POSITIVE LOGITS
     aen
    1.84
     Juf
    1.72
     thut
    1.72
     dises
    1.71
     nece
    1.67
     emphat
    1.65
     inev
    1.64
     mef
    1.63
     meis
    1.63
     fta
    1.63
    Act Density 0.175%

    No Known Activations