INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ǎi
    0.43
    তো
    0.40
     almo
    0.40
    simile
    0.39
    វា
    0.39
     собира
    0.39
    lava
    0.39
    avorites
    0.39
     teorema
    0.38
     आसपास
    0.38
    POSITIVE LOGITS
    راج
    0.44
    NEUTRON
    0.40
    ດ້ານ
    0.39
     Musicians
    0.39
    AndView
    0.38
     కళ
    0.37
    0.37
    日本人
    0.36
    าระ
    0.36
     fallen
    0.36
    Act Density 0.003%

    No Known Activations