INDEX
    Explanations

    code scores / metrics

    New Auto-Interp
    Negative Logits
    Kir
    -0.07
     Directorate
    -0.06
     Planet
    -0.06
     Thesis
    -0.06
     TAM
    -0.06
     Phillips
    -0.06
    Climate
    -0.06
    urent
    -0.06
    _Process
    -0.06
    رخ
    -0.06
    POSITIVE LOGITS
    .blog
    0.07
    شود
    0.07
    -important
    0.07
    :before
    0.06
     blue
    0.06
    ager
    0.06
    ????
    0.06
    ิญ
    0.06
     sch
    0.06
    UNDER
    0.06
    Act Density 0.000%

    No Known Activations