INDEX
    Explanations

    movie titles starting with these words

    New Auto-Interp
    Negative Logits
    in
    0.66
    ia
    0.63
    ي
    0.63
    م
    0.61
    et
    0.58
     and
    0.54
    ik
    0.54
    ق
    0.54
    ل
    0.53
    ان
    0.52
    POSITIVE LOGITS
     misog
    0.50
    лор
    0.49
     musul
    0.48
     때는
    0.48
     तीन
    0.48
     трех
    0.47
    лардын
    0.47
    而在
    0.46
     figur
    0.46
    жка
    0.46
    Act Density 0.018%

    No Known Activations