INDEX
    Explanations

    words following specific introductions

    New Auto-Interp
    Negative Logits
    0.88
    🉐
    0.87
    муще
    0.84
    🍘
    0.84
    🈂
    0.83
    🈶
    0.83
    ttamente
    0.83
    ociazione
    0.82
     vecchio
    0.81
    🈺
    0.81
    POSITIVE LOGITS
     It
    1.90
     However
    1.85
     They
    1.84
     Also
    1.83
     Those
    1.79
     Our
    1.79
     Many
    1.78
     So
    1.78
     Some
    1.77
     Perhaps
    1.76
    Act Density 0.311%

    No Known Activations