INDEX
    Explanations

    phrases that introduce simplifications or clarifications in explanations

    New Auto-Interp
    Negative Logits
     Sort
    -0.17
    deaux
    -0.16
    eprom
    -0.15
    ιά
    -0.15
    ÙģÛĮ
    -0.15
     Äijâu
    -0.14
    hof
    -0.14
    öm
    -0.14
    ysl
    -0.13
    rous
    -0.13
    POSITIVE LOGITS
     put
    0.69
     Put
    0.63
    Put
    0.54
     puts
    0.52
    .put
    0.52
    put
    0.50
     PUT
    0.47
    _put
    0.45
    .Put
    0.43
    puts
    0.40
    Act Density 0.144%

    No Known Activations