INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     mijne
    -0.60
     trouw
    -0.50
     çat
    -0.48
     geluk
    -0.47
     ouder
    -0.47
    approche
    -0.46
    she
    -0.45
     verster
    -0.45
     köt
    -0.44
     leeftijd
    -0.44
    POSITIVE LOGITS
     Video
    1.20
    Video
    1.17
     video
    1.16
    video
    1.09
     VIDEO
    1.01
     Videos
    0.98
     videos
    0.94
    VIDEO
    0.89
    Videos
    0.86
    videos
    0.85
    Act Density 0.016%

    No Known Activations