INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     drowning
    0.46
    terror
    0.44
     elephants
    0.43
     middlemen
    0.42
    дова
    0.41
    lasting
    0.40
    कुर
    0.40
     loneliness
    0.40
    🐋
    0.39
    0.39
    POSITIVE LOGITS
     giảm
    0.50
     até
    0.49
     until
    0.44
     procedimento
    0.44
     sogar
    0.42
    如果是
    0.41
     ridurre
    0.40
     forgiving
    0.39
     ermöglicht
    0.39
     nhẹ
    0.38
    Act Density 0.020%

    No Known Activations