INDEX
    Explanations

    proper nouns and important figures in various contexts

    New Auto-Interp
    Negative Logits
     latter
    -0.17
    zione
    -0.16
    ãģĦãĤĭ
    -0.14
    writing
    -0.14
    åħĴ
    -0.14
    ت
    -0.14
    ìĦ±ìĿ´
    -0.14
    abouts
    -0.13
    ëĤĺ
    -0.13
    listed
    -0.13
    POSITIVE LOGITS
    dÄĽ
    0.17
    sworth
    0.15
    rophy
    0.14
    ëŁ¼
    0.14
    /dr
    0.14
    ìĦľ
    0.14
    itories
    0.13
    ktop
    0.13
    pearance
    0.13
    pillar
    0.13
    Act Density 1.977%

    No Known Activations