INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     purpoſe
    -0.99
     Majefty
    -0.94
     reaſon
    -0.92
    ſelf
    -0.91
     Theſe
    -0.90
     houſe
    -0.89
     Houſe
    -0.88
    ſelves
    -0.87
     pleaſure
    -0.87
     Reſ
    -0.86
    POSITIVE LOGITS
    ftagPool
    0.82
    HasAnnotation
    0.73
     estekak
    0.72
     שוליים
    0.65
    Personensuche
    0.61
    ArgsConstructor
    0.60
    expandindo
    0.60
     المعيارى
    0.59
     otomatig
    0.59
    mitian
    0.59
    Act Density 0.161%

    No Known Activations