INDEX
    Explanations

    references to nuclear weapons and their implications

    New Auto-Interp
    Negative Logits
     mand
    -0.15
     pap
    -0.15
    _PATCH
    -0.15
     Newark
    -0.15
     Noise
    -0.15
    Nested
    -0.14
     Nested
    -0.14
    andal
    -0.14
     Nich
    -0.14
    oin
    -0.14
    POSITIVE LOGITS
     nuclear
    0.73
     Nuclear
    0.61
    æł¸
    0.56
    uclear
    0.55
     atomic
    0.54
     nucle
    0.47
     nu
    0.47
     Atomic
    0.45
     اÙĦÙĨÙĪ
    0.43
    nu
    0.40
    Act Density 0.145%

    No Known Activations