INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     without
    -1.04
     completely
    -1.03
     very
    -1.00
     just
    -0.99
     entirely
    -0.96
    -0.94
     очень
    -0.94
     what
    -0.93
     варіан
    -0.91
     sehr
    -0.91
    POSITIVE LOGITS
     DOWN
    0.92
     dow
    0.91
    dow
    0.88
     ndani
    0.85
     declaração
    0.83
     verborgen
    0.82
     temprano
    0.81
    0.81
    Chi
    0.80
    inib
    0.80
    Act Density 0.002%

    No Known Activations