INDEX
    Explanations

    specific words that indicate presence or existence in various contexts

    New Auto-Interp
    Negative Logits
    endance
    -0.15
    Slf
    -0.14
    Argb
    -0.14
    [section
    -0.13
     distracted
    -0.13
     distract
    -0.13
    à¤Łà¤¨
    -0.12
    erg
    -0.12
     Pivot
    -0.12
    _printf
    -0.12
    POSITIVE LOGITS
    front
    0.18
    rette
    0.18
    ret
    0.17
    -front
    0.17
     molec
    0.17
     front
    0.16
    opoly
    0.15
    -ret
    0.15
     Front
    0.15
     fronts
    0.15
    Act Density 0.027%

    No Known Activations