INDEX
    Explanations

    phrases that indicate a focus on examples or instructional content

    New Auto-Interp
    Negative Logits
    vier
    -0.16
    رس
    -0.15
    quirer
    -0.14
     пеÑĢеп
    -0.14
    aki
    -0.13
    yar
    -0.13
    dzi
    -0.13
    iro
    -0.13
    ieri
    -0.13
    scri
    -0.13
    POSITIVE LOGITS
     example
    0.35
    unately
    0.27
     instance
    0.27
     exemple
    0.26
     Example
    0.25
    cing
    0.25
    example
    0.25
    -example
    0.21
     ÙħثاÙĦ
    0.20
     details
    0.20
    Act Density 0.063%

    No Known Activations