INDEX
    Explanations

    references to specific films and characters in media

    New Auto-Interp
    Negative Logits
    owie
    -0.15
    áli
    -0.15
     @}
    -0.14
    ách
    -0.14
     Ú©ÙĦÛĮ
    -0.14
    ergus
    -0.14
    ä¹
    -0.13
    působ
    -0.13
     Claw
    -0.13
    lyn
    -0.13
    POSITIVE LOGITS
     Avatar
    0.33
    Avatar
    0.27
     avatar
    0.25
     bending
    0.24
     Air
    0.23
     Roku
    0.23
     Nickel
    0.23
     Fire
    0.22
    avatar
    0.21
     Sok
    0.20
    Act Density 0.001%

    No Known Activations