INDEX
    Explanations

    phrases that contain questions about definitions and explanations of concepts

    New Auto-Interp
    Negative Logits
    ehler
    -0.15
    empl
    -0.15
    UBL
    -0.15
    UCH
    -0.14
    ift
    -0.14
    ansen
    -0.14
    ebo
    -0.14
    ily
    -0.14
    _TLS
    -0.14
    immel
    -0.14
    POSITIVE LOGITS
    ĵåIJį
    0.15
    enza
    0.14
    warz
    0.14
    adlo
    0.14
    pio
    0.13
     Daniels
    0.13
    ÃŃsto
    0.13
    हन
    0.13
    .opens
    0.13
     Scre
    0.13
    Act Density 0.073%

    No Known Activations