INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ?
    0.87
    .
    0.84
     keen
    0.76
    ัด
    0.76
     rightly
    0.75
     sicherlich
    0.73
    0.73
     certainly
    0.73
    haps
    0.72
     judicious
    0.71
    POSITIVE LOGITS
    <unused483>
    1.33
    <unused2173>
    1.20
    <unused1022>
    1.20
    <unused2155>
    1.20
    <unused576>
    1.18
    <unused446>
    1.18
    <unused184>
    1.16
    <unused481>
    1.15
    <unused456>
    1.15
    <unused654>
    1.14
    Act Density 1.889%

    No Known Activations