INDEX
    Explanations

    references to images and their credits in the text

    New Auto-Interp
    Negative Logits
     ActionTypes
    -0.16
    æħİ
    -0.16
    SSF
    -0.15
     tune
    -0.15
    ubat
    -0.15
    عÙĨÙĪØ§ÙĨ
    -0.14
    unta
    -0.14
    erson
    -0.14
    izens
    -0.14
    gil
    -0.14
    POSITIVE LOGITS
     PA
    0.35
    PA
    0.32
     pa
    0.22
    _PA
    0.21
    Mirror
    0.19
    pa
    0.19
     Mirror
    0.18
     PRESS
    0.18
    .pa
    0.18
     Pa
    0.18
    Act Density 0.015%

    No Known Activations