INDEX
Explanations
instances of forgetting and memory-related expressions
New Auto-Interp
Negative Logits
awtextra
-0.56
pleaſure
-0.46
itſelf
-0.44
Carrière
-0.43
wiſe
-0.42
Pranala
-0.42
rshire
-0.42
Aos
-0.40
AOS
-0.40
userDao
-0.39
POSITIVE LOGITS
forget
1.80
forgot
1.64
forgets
1.63
forgetting
1.59
forgotten
1.58
forget
1.55
Forget
1.51
FORGET
1.45
forgotten
1.38
Forget
1.38
Activations Density 0.160%