INDEX
    Explanations

    numbers and math operations

    New Auto-Interp
    Negative Logits
     σ
    0.39
     polarisation
    0.37
     Wittgenstein
    0.36
     ద్వ
    0.36
    Switcher
    0.34
     cereals
    0.33
     riots
    0.33
     cities
    0.32
     rule
    0.31
     γ
    0.31
    POSITIVE LOGITS
    amanho
    0.35
     tỏ
    0.35
    чних
    0.33
    0.33
     које
    0.33
    atum
    0.33
    さまざまな
    0.32
    0.31
    یکی
    0.31
    0.31
    Act Density 0.006%

    No Known Activations