INDEX
    Explanations

    fragmentos de palabras extranjeras

    New Auto-Interp
    Negative Logits
    ні
    0.75
    ين
    0.71
    h
    0.62
    ре
    0.59
    ur
    0.59
    0.59
    v
    0.58
    ди
    0.56
    Т
    0.54
    ٹ
    0.54
    POSITIVE LOGITS
     was
    0.58
    >
    0.56
     recib
    0.52
    ä
    0.50
     hanno
    0.49
     leit
    0.48
     have
    0.47
    0.46
    '
    0.45
     been
    0.45
    Act Density 0.000%

    No Known Activations