INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    他說
    0.43
     vuole
    0.40
     piensan
    0.39
     voulez
    0.38
     chcesz
    0.38
     savent
    0.38
     quiser
    0.38
     haría
    0.37
     quieres
    0.37
     bunu
    0.37
    POSITIVE LOGITS
     belongs
    0.49
     belong
    0.46
     consists
    0.44
     represents
    0.44
     serves
    0.43
     belonged
    0.43
     comprises
    0.42
     được
    0.41
    0.40
    belong
    0.40
    Act Density 0.381%

    No Known Activations