INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     укра
    -0.07
    Tro
    -0.07
     Executor
    -0.06
     arz
    -0.06
     klíč
    -0.06
     capability
    -0.06
    Threshold
    -0.06
    .Linear
    -0.06
    แสง
    -0.06
     Judaism
    -0.06
    POSITIVE LOGITS
    _starts
    0.06
     Belediyesi
    0.06
    BUM
    0.06
     Known
    0.06
    aligned
    0.06
    іч
    0.06
    Getty
    0.06
    .equal
    0.06
    .he
    0.06
    овари
    0.06
    Act Density 0.022%

    No Known Activations