INDEX
    Explanations

    instances of the word "feature" and its variations in various contexts

    New Auto-Interp
    Negative Logits
    sWith
    -0.16
    arin
    -0.16
    enheim
    -0.16
    /fire
    -0.15
    ni
    -0.14
     fever
    -0.14
    oper
    -0.14
    омеÑĢ
    -0.14
    ners
    -0.14
    ses
    -0.14
    POSITIVE LOGITS
     prominently
    0.35
    tte
    0.26
    691
    0.17
    eting
    0.17
    547
    0.16
    ettings
    0.15
    eted
    0.15
    472
    0.15
    itarian
    0.15
    ilities
    0.15
    Act Density 0.034%

    No Known Activations