INDEX
    Explanations

    robot or robotic mentions

    New Auto-Interp
    Negative Logits
    ן
    1.27
    1.25
    y
    1.17
    h
    1.16
    1.16
     is
    1.09
    ри
    1.07
    ווי
    1.07
    the
    1.06
    i
    1.05
    POSITIVE LOGITS
    }'
    0.96
     an
    0.94
    д
    0.91
    ),
    0.89
    )’
    0.88
    ");
    0.87
    ъ
    0.86
    0.84
     beş
    0.84
    lll
    0.84
    Act Density 0.222%

    No Known Activations