INDEX
    Explanations

    questions about motivation or cause

    New Auto-Interp
    Negative Logits
     alles
    1.01
     ANYTHING
    0.95
     anything
    0.94
     גם
    0.88
    0.85
     Anything
    0.84
    YNAM
    0.84
     logros
    0.84
     tudo
    0.82
     extravaganza
    0.82
    POSITIVE LOGITS
     these
    0.85
     use
    0.84
    These
    0.78
     použití
    0.73
    these
    0.72
    হল
    0.72
    0.71
    θη
    0.70
    вих
    0.69
     ili
    0.69
    Act Density 0.128%

    No Known Activations