INDEX
    Explanations

    structures resembling code or programming elements

    New Auto-Interp
    Negative Logits
    ()")
    -0.85
    }")
    
    -0.81
    ']")
    -0.80
    )")
    -0.78
    ]")
    -0.78
    ”]
    -0.77
    }")
    -0.76
    ?")
    -0.76
    }`)
    -0.75
    )");
    
    -0.73
    POSITIVE LOGITS
    "),
    1.91
    ),
    1.89
    '),
    1.88
    '],
    1.69
    },
    1.67
    ],
    1.63
    "],
    1.59
    ”),
    1.59
     },
    1.58
     */,
    1.55
    Act Density 2.222%

    No Known Activations