INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     
    0.87
     which
    0.82
     of
    0.81
     които
    0.78
    2
    0.76
     因为
    0.75
    3
    0.75
    5
    0.75
     因為
    0.74
    6
    0.74
    POSITIVE LOGITS
     preferências
    0.84
     notícias
    0.83
     teamwork
    0.80
     pertinente
    0.79
     divertido
    0.78
     relevante
    0.78
     ответственность
    0.75
     vínculos
    0.75
     coercion
    0.75
     смысл
    0.75
    Act Density 0.001%

    No Known Activations