INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    =
    1.00
    並且
    0.85
    .=
    0.83
    并且
    0.83
     แต
    0.82
    0.81
     =
    0.81
    并通过
    0.77
    =-
    0.75
    &=
    0.75
    POSITIVE LOGITS
    >∈</
    0.95
     in
    0.69
     listed
    0.69
    0.69
    0.69
    ravel
    0.66
    onto
    0.66
     mentions
    0.65
    કામાં
    0.65
     diversité
    0.64
    Act Density 0.069%

    No Known Activations