INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Y
    -0.95
    Y
    -0.91
     si
    -0.88
     Si
    -0.74
    Si
    -0.59
    ying
    -0.57
     matters
    -0.54
     Mooney
    -0.53
     y
    -0.51
    cion
    -0.50
    POSITIVE LOGITS
     itſelf
    0.98
     Jefus
    0.89
     cauſe
    0.87
     ſtate
    0.87
     myſelf
    0.84
     quæ
    0.83
     ſmall
    0.83
     Reſ
    0.83
     Monfieur
    0.82
    protoimpl
    0.81
    Act Density 0.241%

    No Known Activations