INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ,
    1.05
     belangrijk
    0.83
     juist
    0.81
    <unused2213>
    0.80
    ء
    0.80
    ,【
    0.77
     terwijl
    0.76
    ε
    0.76
     eigenlijk
    0.76
    0.75
    POSITIVE LOGITS
    one
    1.52
    for
    1.49
     for
    1.20
    ai
    1.17
    are
    1.13
    ou
    1.13
    i
    1.10
    if
    1.09
    1
    1.09
    os
    1.06
    Act Density 0.000%

    No Known Activations