INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     purpoſe
    -1.07
     themſelves
    -1.03
     pleaſure
    -1.03
     myſelf
    -1.03
     Jefus
    -0.96
     itſelf
    -0.95
     uſed
    -0.95
     himſelf
    -0.94
     raiſ
    -0.94
    technique
    -0.93
    POSITIVE LOGITS
     for
    0.66
    ,
    0.59
     sem
    0.58
     [
    0.57
     to
    0.57
    0.57
     like
    0.55
     Sem
    0.55
     with
    0.54
     (
    0.53
    Act Density 0.145%

    No Known Activations