INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
     only
    -0.96
     even
    -0.95
     не
    -0.95
     after
    -0.93
     might
    -0.92
     when
    -0.92
     would
    -0.92
     yet
    -0.91
     но
    -0.91
     may
    -0.91
    POSITIVE LOGITS
    <bos>
    10.47
     encomp
    3.68
     guarante
    3.64
     affor
    3.61
     fuf
    3.60
     ?...
    3.58
     effe
    3.57
     increa
    3.55
     !...
    3.52
     squa
    3.51
    Act Density 0.000%

    No Known Activations

    This feature has no known activations.