INDEX
    Explanations

    understandable systems or languages

    New Auto-Interp
    Negative Logits
    Р
    0.52
    Ар
    0.48
    Ч
    0.48
    Д
    0.46
    0.46
     ugl
    0.45
     определенных
    0.45
    Ви
    0.45
    Ал
    0.45
    Ш
    0.45
    POSITIVE LOGITS
    igree
    0.50
    nurse
    0.49
    lude
    0.48
    virus
    0.47
    timeline
    0.47
     timeline
    0.46
    Barnes
    0.46
    bleau
    0.45
    preview
    0.45
     Herpes
    0.44
    Act Density 0.036%

    No Known Activations