INDEX
    Explanations

    phrases related to regulations and legal standards

    New Auto-Interp
    Negative Logits
     another
    -0.43
     few
    -0.42
     some
    -0.42
     certain
    -0.42
     sometimes
    -0.41
     précis
    -0.41
    another
    -0.38
     algunas
    -0.36
     ujednoznacz
    -0.36
    -0.36
    POSITIVE LOGITS
    的一切
    0.91
     everything
    0.81
     Semua
    0.80
    Everything
    0.79
    everything
    0.78
     Everything
    0.78
     EVERYTHING
    0.77
     모든
    0.74
    WriteBarrier
    0.73
     ویکی‌پدی
    0.72
    Act Density 0.774%

    No Known Activations