INDEX
    Explanations

    relative pronouns

    New Auto-Interp
    Negative Logits
     الدم
    -0.08
    :event
    -0.07
     الحل
    -0.07
     departing
    -0.07
     занима
    -0.07
    .Ap
    -0.07
     Kant
    -0.07
    .dim
    -0.07
     neutron
    -0.06
     ..
    -0.06
    POSITIVE LOGITS
     architect
    0.07
    central
    0.07
     Autonomous
    0.06
     userId
    0.06
    ropic
    0.06
    -/
    0.06
    0.06
    ɩ
    0.06
    借用
    0.06
    0.06
    Act Density 0.049%

    No Known Activations