INDEX
    Explanations

    references to file operations and formats

    New Auto-Interp
    Negative Logits
    ipar
    -0.17
     Tavern
    -0.15
    umat
    -0.15
    dar
    -0.15
     Dar
    -0.15
    uner
    -0.15
    rimon
    -0.14
    uro
    -0.14
    iva
    -0.14
     Martha
    -0.14
    POSITIVE LOGITS
     extension
    0.69
    extension
    0.61
     extensions
    0.60
     Extension
    0.60
    -extension
    0.57
    Extension
    0.56
    _extension
    0.53
     Extensions
    0.51
    .extension
    0.51
    extensions
    0.50
    Act Density 0.085%

    No Known Activations