INDEX
    Explanations

    phrases and structures indicating original content and examples

    New Auto-Interp
    Negative Logits
    ingt
    -0.14
    zbollah
    -0.14
    jed
    -0.13
    ug
    -0.13
    ovic
    -0.13
    wald
    -0.13
     ============================================================================↵
    -0.13
    porto
    -0.13
    owitz
    -0.12
    опÑĢи
    -0.12
    POSITIVE LOGITS
    iser
    0.17
    antal
    0.15
     Westbrook
    0.15
    assertCount
    0.14
    ÏĢη
    0.14
    atat
    0.13
    ालà¤ķ
    0.13
    inch
    0.13
    licken
    0.13
    robots
    0.13
    Act Density 0.063%

    No Known Activations