INDEX
    Explanations

    instances of specific actions or occurrences within the text

    New Auto-Interp
    Negative Logits
    ulg
    -0.20
    ulis
    -0.17
    arios
    -0.16
    iores
    -0.16
    zew
    -0.16
    ecta
    -0.16
    assin
    -0.15
     charts
    -0.15
    ycin
    -0.15
    iciel
    -0.15
    POSITIVE LOGITS
    ë°±
    0.16
     Frank
    0.16
     circ
    0.16
     Norm
    0.15
    cline
    0.15
     Crescent
    0.15
     Opt
    0.14
     Cree
    0.14
    ila
    0.14
    rr
    0.14
    Act Density 0.008%

    No Known Activations