INDEX
    Explanations

    numerical data related to experimental results

    New Auto-Interp
    Negative Logits
    зулта
    -0.61
    ukone
    -0.55
     باخ
    -0.55
     ve
    -0.55
    estad
    -0.54
    InjectAttribute
    -0.54
    ^^^^^^^^
    -0.53
    Galería
    -0.53
    roughs
    -0.53
    :]:
    -0.52
    POSITIVE LOGITS
    0
    1.00
    ########.
    0.83
    3
    0.83
    1
    0.82
    5
    0.82
    2
    0.82
    4
    0.78
    6
    0.78
    7
    0.74
    8
    0.73
    Act Density 0.399%

    No Known Activations