INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     owning
    -0.09
     полный
    -0.08
    sons
    -0.08
     sons
    -0.08
     buitenlandse
    -0.08
    versicherung
    -0.08
     vaz
    -0.07
     wasted
    -0.07
     unnecessary
    -0.07
    基金
    -0.07
    POSITIVE LOGITS
    字幕
    0.12
    Throughout
    0.11
     Throughout
    0.10
     स्क्रीन
    0.10
     throughout
    0.10
     появляется
    0.09
     serif
    0.09
     लोगो
    0.09
     subtitles
    0.09
     captions
    0.09
    Act Density 0.009%

    No Known Activations