INDEX
    Explanations

    here are explanations or options

    New Auto-Interp
    Negative Logits
    𝐜
    0.37
     wrongfully
    0.37
     govern
    0.37
     xhrObj
    0.36
     তথাপি
    0.36
     quién
    0.35
    <unused2148>
    0.35
    <unused2152>
    0.35
     dónde
    0.35
    <unused2153>
    0.35
    POSITIVE LOGITS
    Featuring
    0.38
    Stri
    0.38
    The
    0.38
    Guests
    0.37
    Based
    0.36
    Previously
    0.36
    Offers
    0.35
     Trained
    0.35
    St
    0.34
    Known
    0.34
    Act Density 0.001%

    No Known Activations