INDEX
    Explanations

    Okay, conversational starter

    New Auto-Interp
    Negative Logits
    ap
    0.55
    ie
    0.46
    pleri
    0.46
    ot
    0.45
    it
    0.44
    astom
    0.44
    trace
    0.43
    atoren
    0.43
    ast
    0.42
    ab
    0.42
    POSITIVE LOGITS
    0.48
     induces
    0.47
     rosso
    0.47
     motivates
    0.46
    0.46
     aunts
    0.45
    れて
    0.45
    低下
    0.45
     catalyzes
    0.45
     passare
    0.44
    Act Density 0.002%

    No Known Activations