INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1
    0.56
    5
    0.50
    7
    0.50
    4
    0.49
    0.48
    2
    0.46
    6
    0.46
    9
    0.45
    8
    0.44
    0
    0.44
    POSITIVE LOGITS
     attaque
    0.41
     Verkauf
    0.39
     girlfriend
    0.38
     soms
    0.38
     disputes
    0.37
     laisser
    0.37
     envoyer
    0.37
     perdió
    0.37
     mischiev
    0.37
     نحاول
    0.36
    Act Density 0.442%

    No Known Activations