INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.78
     is
    -0.75
    '
    -0.72
     was
    -0.70
    -0.66
     or
    -0.60
     cui
    -0.60
     That
    -0.58
      
    -0.58
     désolés
    -0.57
    POSITIVE LOGITS
     myſelf
    0.89
     ſeveral
    0.81
     doubtnut
    0.77
     Eſ
    0.74
     Efq
    0.71
     faſt
    0.71
     Jefus
    0.70
     ſind
    0.70
     uſed
    0.69
     ſte
    0.69
    Act Density 0.128%

    No Known Activations