INDEX
    Explanations

    self-exploration and components

    New Auto-Interp
    Negative Logits
    Agriculture
    0.47
     puterea
    0.44
    0.44
    0.44
     الزرا
    0.43
    Pride
    0.43
    0.43
     γά
    0.42
     typographic
    0.41
    0.41
    POSITIVE LOGITS
    inac
    0.49
    ritas
    0.47
     strapping
    0.45
     revealed
    0.45
    x
    0.45
     حدی
    0.44
     followed
    0.44
    to
    0.44
     然后
    0.43
    us
    0.43
    Act Density 0.004%

    No Known Activations