INDEX
Explanations
references to reminders and self-reflection
New Auto-Interp
Negative Logits
brtc
-0.76
easeOut
-0.68
EXPERIMENTAL
-0.67
oplan
-0.66
ضب
-0.66
crose
-0.65
Phương
-0.64
colgroup
-0.64
ypso
-0.63
chesse
-0.62
POSITIVE LOGITS
remind
2.20
reminded
2.15
reminder
2.09
reminding
2.08
reminders
2.04
reminds
2.02
remind
1.92
Reminders
1.88
Remind
1.86
Remind
1.85
Activations Density 0.053%