INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ছড়িয়ে
    0.46
     επισ
    0.46
     က
    0.45
    0.45
    <unused256>
    0.45
    Subtraction
    0.45
     tuttavia
    0.45
     سلاٹس
    0.44
     möjligt
    0.43
    𝔻
    0.43
    POSITIVE LOGITS
     older
    0.54
    i
    0.49
    fangen
    0.46
     already
    0.46
    י
    0.45
     reformed
    0.44
     benefitted
    0.42
     appreciated
    0.42
     previously
    0.42
    globe
    0.41
    Act Density 0.004%

    No Known Activations