INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     caric
    -0.08
     grooming
    -0.08
     induct
    -0.07
     regarding
    -0.07
     Lima
    -0.07
    ub
    -0.07
    little
    -0.07
     buried
    -0.07
    -0.07
     laser
    -0.07
    POSITIVE LOGITS
    Ago
    0.08
    0.08
     polymer
    0.08
    Tat
    0.07
     technician
    0.07
    0.07
     Staat
    0.07
     axle
    0.07
     Demo
    0.07
     idol
    0.07
    Act Density 0.006%

    No Known Activations