INDEX
Explanations
instances of the word "mistake"
references to mistakes or errors
New Auto-Interp
Negative Logits
ighth
-0.80
electric
-0.77
rollers
-0.76
borg
-0.73
well
-0.71
markets
-0.69
zona
-0.69
vez
-0.67
tsky
-0.66
bians
-0.66
POSITIVE LOGITS
mistakes
0.86
mistake
0.81
%%%%
0.73
pelled
0.71
unfocusedRange
0.68
ACTIONS
0.68
steps
0.65
mistaken
0.63
errors
0.63
Malf
0.63
Activations Density 0.028%