INDEX
    Explanations

    say positive attributes

    New Auto-Interp
    Negative Logits
     work
    0.70
     Work
    0.67
     работу
    0.65
     warm
    0.63
     lavoro
    0.62
     trabajo
    0.61
     работы
    0.61
    Work
    0.59
     Warm
    0.59
     trabalho
    0.59
    POSITIVE LOGITS
     fairly
    0.74
    Quiet
    0.47
     quiet
    0.46
     top
    0.43
     left
    0.42
    quiet
    0.42
    detail
    0.41
     Quiet
    0.40
    left
    0.39
     detail
    0.39
    Act Density 0.000%

    No Known Activations