INDEX
    Explanations

    references to specific groups or categories

    New Auto-Interp
    Negative Logits
     itſelf
    -0.95
    ſelf
    -0.81
     Theſe
    -0.80
     purpoſe
    -0.79
    ItemBackground
    -0.79
     variés
    -0.78
    AccessorTable
    -0.78
     ſeveral
    -0.75
     Monfieur
    -0.74
    的其他
    -0.74
    POSITIVE LOGITS
     two
    0.76
    two
    0.68
    تين
    0.63
    respectively
    0.61
     Two
    0.60
    Two
    0.58
     beiden
    0.57
    +#+#
    0.54
     respectively
    0.53
     deux
    0.53
    Act Density 0.680%

    No Known Activations