INDEX
    Explanations

    numerical values or counts

    New Auto-Interp
    Negative Logits
     Thirty
    -0.54
    3
    -0.54
    Thirty
    -0.49
     Thirdly
    -0.47
     trio
    -0.47
    -0.46
     الثلاث
    -0.46
     WEDNESDAY
    -0.46
     rois
    -0.45
    -0.45
    POSITIVE LOGITS
    WriteTagHelper
    0.64
     beginnetje
    0.58
    UnsafeEnabled
    0.58
    UnusedPrivate
    0.56
    retweeted
    0.54
     surla
    0.53
     adaptation
    0.53
    findpost
    0.52
     ninth
    0.52
    adaptation
    0.51
    Act Density 0.037%

    No Known Activations