INDEX
    Explanations

    references to values with varying degrees of significance or importance in a context

    New Auto-Interp
    Negative Logits
    VOKE
    -0.15
    شاÙĨ
    -0.15
    baz
    -0.15
    /Gate
    -0.14
    voke
    -0.14
    arna
    -0.14
     rent
    -0.14
    Sprites
    -0.14
    odom
    -0.14
    shore
    -0.13
    POSITIVE LOGITS
     TMPro
    0.17
    tc
    0.15
    td
    0.15
    åIJĮåѦ
    0.15
    mond
    0.15
     frauen
    0.14
     Separator
    0.14
    iesel
    0.14
    _SAMPLES
    0.14
     Abrams
    0.13
    Act Density 0.022%

    No Known Activations