INDEX
    Explanations

    statistics, data, and analysis

    New Auto-Interp
    Negative Logits
     was
    1.00
    \
    0.92
    "
    0.88
    !
    0.77
    0.75
    abilir
    0.72
    ->
    0.70
     to
    0.70
    borhood
    0.70
    ))
    0.69
    POSITIVE LOGITS
    w
    1.07
    c
    1.05
    <0x80>
    0.98
    ת
    0.97
    r
    0.95
    0.93
    0.92
    0.92
    j
    0.87
    ва
    0.86
    Act Density 0.014%

    No Known Activations