INDEX
Explanations
instances of a specific formatting pattern (Ċ followed by a number)
titles or headings related to various topics, particularly in structured formats like lists or instructions
New Auto-Interp
Negative Logits
hement
-0.79
resc
-0.67
speeding
-0.66
citiz
-0.65
subpoen
-0.63
Sind
-0.62
Saras
-0.62
neighb
-0.61
Singh
-0.60
isconsin
-0.59
POSITIVE LOGITS
³³³³³³³³³³³³³³³³
0.95
Spoiler
0.89
http
0.88
³³³³
0.82
³³³³³³³³
0.82
Unknown
0.81
Reward
0.80
https
0.80
âĹı
0.80
Ingredients
0.77
Activations Density 0.126%