INDEX
Explanations
repeated mentions of flags and their statuses
New Auto-Interp
Negative Logits
"}")
-0.69
."</
-0.65
*/)
-0.65
}}$}
-0.64
😚
-0.63
]")
-0.63
\"");
-0.63
ymce
-0.63
INSTANCE
-0.61
*/}
-0.61
POSITIVE LOGITS
flag
2.96
Flag
2.92
flag
2.84
flags
2.78
FLAG
2.71
Flag
2.68
Flags
2.50
FLAG
2.43
Flags
2.28
flags
2.25
Activations Density 0.035%