INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     their
    -1.78
    -1.75
    '
    -1.71
     ‘
    -1.70
     its
    -1.70
     बजाय
    -1.66
    ’’
    -1.66
     cbd
    -1.55
    "—
    -1.52
    ʽ
    -1.49
    POSITIVE LOGITS
    If
    1.98
    7
    1.94
    according
    1.86
    Karena
    1.80
    Additionally
    1.78
     مطرح
    1.78
    Karakteristik
    1.77
    4
    1.77
    8
    1.74
    Jangan
    1.70
    Act Density 0.027%

    No Known Activations