INDEX
    Explanations

    lists following specific keywords

    New Auto-Interp
    Negative Logits
     violating
    0.45
     complying
    0.44
     justifies
    0.43
     setup
    0.42
     office
    0.41
     justifying
    0.39
    0.38
     infringing
    0.38
     responsive
    0.38
    🆕
    0.38
    POSITIVE LOGITS
     великолеп
    0.52
     tcpHeader
    0.46
     idxf
    0.45
     heartily
    0.45
     maravilh
    0.44
     хорошо
    0.44
     القلب
    0.44
    豐富
    0.43
     rzeczy
    0.43
     Mox
    0.43
    Act Density 0.004%

    No Known Activations