INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    <unused1935>
    2.10
    <unused697>
    2.08
    <unused398>
    2.02
    1.95
    <unused2077>
    1.94
    1.94
    <unused1208>
    1.93
    <unused1898>
    1.92
    <unused1976>
    1.92
    <unused1957>
    1.92
    POSITIVE LOGITS
     this
    1.50
     these
    1.31
    this
    1.31
     данного
    1.12
    these
    1.04
     such
    1.04
     этого
    1.01
     này
    0.98
     данной
    0.97
     our
    0.92
    Act Density 0.538%

    No Known Activations