INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     ent
    0.43
     n
    0.43
     s
    0.41
    al
    0.39
     str
    0.38
    os
    0.37
     о
    0.36
     inter
    0.35
    h
    0.35
     ac
    0.35
    POSITIVE LOGITS
    <unused530>
    0.64
    <unused414>
    0.61
    <unused637>
    0.60
    <unused279>
    0.60
    <unused1894>
    0.59
    <unused1770>
    0.58
    <unused1881>
    0.57
    <unused1794>
    0.57
    garakan
    0.57
    <unused552>
    0.57
    Act Density 1.974%

    No Known Activations