INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     AssemblyCompany
    -0.77
     CascadeType
    -0.75
    TagMode
    -0.72
     صوتيه
    -0.71
    Autoritní
    -0.69
    transQ
    -0.68
     informée
    -0.68
    ロウィン
    -0.68
     EconPapers
    -0.68
    /**
    -0.66
    POSITIVE LOGITS
    ll
    0.64
    re
    0.57
    нибудь
    0.41
     theyre
    0.41
    RE
    0.37
    ill
    0.37
    figure
    0.34
    le
    0.32
     figure
    0.32
    0.31
    Act Density 0.119%

    No Known Activations