INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    م
    0.50
    فض
    0.46
    0.45
    жи
    0.44
    0.44
    Stroke
    0.43
    Fus
    0.43
    Thesis
    0.43
    د
    0.43
     Balk
    0.42
    POSITIVE LOGITS
     구해
    0.50
     difficulté
    0.49
     redução
    0.46
     décide
    0.46
     victimes
    0.46
     competição
    0.45
     bunu
    0.44
    0.44
     trajet
    0.44
     comunicação
    0.43
    Act Density 0.000%

    No Known Activations