INDEX
    Explanations

    the first capital biggest

    New Auto-Interp
    Negative Logits
    0.39
    v
    0.38
    0.38
     fournis
    0.38
    این
    0.36
    hoz
    0.36
    Hej
    0.36
    ng
    0.36
    0.35
    که
    0.35
    POSITIVE LOGITS
     opposed
    0.44
    पति
    0.43
    ymmet
    0.42
     MLC
    0.42
     solcher
    0.41
     importantly
    0.40
     ответственность
    0.39
     przyczyn
    0.38
     stara
    0.38
     टी
    0.38
    Act Density 0.337%

    No Known Activations