INDEX
    Explanations

    reasoning or justification

    New Auto-Interp
    Negative Logits
    ээ
    0.47
     muziek
    0.44
     newMovie
    0.42
     jaane
    0.42
     seconda
    0.41
     music
    0.41
    тература
    0.40
    widet
    0.39
     nuove
    0.39
     laughs
    0.39
    POSITIVE LOGITS
     कर्
    0.41
     muốn
    0.40
     Similarly
    0.40
     dispositivo
    0.39
     нуле
    0.39
     Zero
    0.38
     जीरो
    0.38
     zero
    0.37
    bate
    0.37
     didn
    0.37
    Act Density 0.000%

    No Known Activations