INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    wi
    -0.82
    WI
    -0.79
    s
    -0.57
     Katz
    -0.54
     CTA
    -0.52
    sing
    -0.50
    sd
    -0.49
    ikaze
    -0.49
    CTA
    -0.48
    IVATE
    -0.47
    POSITIVE LOGITS
    AccessorTable
    0.97
     Reſ
    0.81
     ویکی‌پدیا
    0.80
     itſelf
    0.78
    #+#
    0.77
     Diſ
    0.77
     purpoſe
    0.76
    ſelves
    0.75
     mergeFrom
    0.74
     Eſ
    0.73
    Act Density 0.225%

    No Known Activations