INDEX
Explanations
instances of repetition or similarity in language
New Auto-Interp
Negative Logits
JKLMNOP
-0.17
oure
-0.15
quo
-0.14
AFP
-0.14
oux
-0.14
::$_
-0.14
'gc
-0.14
oul
-0.14
.azure
-0.13
ila
-0.13
POSITIVE LOGITS
previous
0.27
previously
0.25
earlier
0.23
.previous
0.21
previous
0.21
Previously
0.21
regular
0.20
Previous
0.20
normal
0.19
usual
0.18
Activations Density 0.147%