INDEX
    Explanations

    phrases related to significant events or impactful actions, particularly those concerning loss or life-changing moments

    New Auto-Interp
    Negative Logits
     houſe
    -0.92
     pleaſure
    -0.90
     CreateTagHelper
    -0.82
     neceff
    -0.81
     perſon
    -0.80
     uncin
    -0.80
    ſelf
    -0.80
     Савезне
    -0.80
     argint
    -0.79
     ſtate
    -0.79
    POSITIVE LOGITS
     been
    0.87
     NSCoder
    0.79
     gone
    0.69
     gotten
    0.69
     since
    0.68
     recentemente
    0.66
     fått
    0.65
    miştir
    0.65
     gått
    0.60
     has
    0.59
    Act Density 0.507%

    No Known Activations