INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    film
    -0.06
    Lists
    -0.06
     coer
    -0.06
     desp
    -0.06
    .trace
    -0.06
     practical
    -0.06
     VI
    -0.06
     Bowie
    -0.06
     =================================
    -0.06
    _POL
    -0.06
    POSITIVE LOGITS
     td
    0.08
     vào
    0.07
    (td
    0.07
     reasoning
    0.07
     Inn
    0.06
    Christopher
    0.06
     نتیجه
    0.06
     Koch
    0.06
    ddy
    0.06
    .AddDays
    0.06
    Act Density 0.004%

    No Known Activations