INDEX
    Explanations

    describing purpose or nature

    New Auto-Interp
    Negative Logits
    lowercase
    0.49
    another
    0.47
    ان
    0.46
    pleasant
    0.46
    that
    0.45
    itimate
    0.43
    0.43
     aggravate
    0.42
    ر
    0.42
    pathetic
    0.42
    POSITIVE LOGITS
     μέσα
    0.45
     везде
    0.44
    );
    0.43
    )--
    0.42
     addon
    0.40
    `;
    0.40
     unseres
    0.40
    แผน
    0.40
    versch
    0.40
     построен
    0.40
    Act Density 0.004%

    No Known Activations