INDEX
    Explanations

    twin studies

    New Auto-Interp
    Negative Logits
     professional
    -0.07
     complaints
    -0.07
    _actor
    -0.07
     sword
    -0.06
    Name
    -0.06
     Coch
    -0.06
     drank
    -0.06
     practiced
    -0.06
     summarize
    -0.06
     Connected
    -0.06
    POSITIVE LOGITS
     nhìn
    0.07
    vangst
    0.06
     інт
    0.06
     stair
    0.06
     baru
    0.06
     row
    0.06
     demol
    0.06
    	row
    0.06
     grille
    0.06
    -orders
    0.06
    Act Density 0.012%

    No Known Activations