INDEX
    Explanations

    core principles violated

    New Auto-Interp
    Negative Logits
     संप
    0.70
     copper
    0.69
    累積
    0.67
    હુ
    0.67
    Ign
    0.67
     underestimated
    0.66
     mom
    0.65
    0.65
     Californians
    0.65
    0.65
    POSITIVE LOGITS
     central
    0.77
     hallmark
    0.77
     blatant
    0.74
     ciri
    0.73
     direct
    0.67
     सख्ती
    0.67
    明显
    0.66
     intentional
    0.66
     parody
    0.66
     defining
    0.65
    Act Density 0.596%

    No Known Activations