INDEX
    Explanations

    good answer descriptions

    New Auto-Interp
    Negative Logits
    т
    0.58
    t
    0.51
    0.48
    Experiment
    0.47
    0.47
    гән
    0.45
    itação
    0.45
     карка
    0.42
    Brooks
    0.41
    iénd
    0.41
    POSITIVE LOGITS
     velmi
    0.50
     dvou
    0.48
     bagels
    0.47
     bastante
    0.46
     muito
    0.45
     Muito
    0.44
     wygląda
    0.44
     molto
    0.44
     bonnes
    0.44
    不错
    0.43
    Act Density 0.003%

    No Known Activations