INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.19
     making
    0.82
     buying
    0.80
     Plan
    0.77
     Bro
    0.73
     Bren
    0.72
     train
    0.72
     slow
    0.72
     practicar
    0.71
    ↵↵
    0.70
    POSITIVE LOGITS
    Sunglasses
    0.89
    Benzyloxy
    0.88
     कपकेक
    0.87
     법칙
    0.86
     পরস্প
    0.85
    Bathroom
    0.85
     adjunction
    0.83
    0.81
     क्रमश
    0.80
     douze
    0.80
    Act Density 0.002%

    No Known Activations