INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.98
     ότι
    0.96
    0.94
     کہ
    0.92
     that
    0.92
    0.92
    0.91
     tokamaks
    0.91
     οποία
    0.90
     and
    0.90
    POSITIVE LOGITS
    p
    1.33
    c
    1.28
    t
    1.22
    ut
    1.19
    d
    1.19
    (
    1.14
    h
    1.13
    s
    1.11
    e
    1.09
    o
    1.05
    Act Density 0.020%

    No Known Activations