INDEX
Explanations
punctuation and special characters indicating structure in the text
Code, math, or identifiers
than, linear, tip structures
New Auto-Interp
Negative Logits
Aman
-0.80
McGovern
-0.70
Aman
-0.70
Gat
-0.66
Kog
-0.63
Carver
-0.62
Harrison
-0.62
Harrison
-0.62
\{\\-0.60
Kot
-0.60
POSITIVE LOGITS
Hyman
0.78
Faj
0.76
RS
0.76
df
0.73
Julien
0.73
rainbow
0.71
RS
0.71
df
0.71
Ced
0.69
TUL
0.69
Activations Density 0.760%