INDEX
Explanations
timestamps in a specific format
occurrences of punctuation marks, particularly periods
New Auto-Interp
Negative Logits
reluct
-0.63
finances
-0.62
coales
-0.62
neglig
-0.61
oun
-0.61
neighb
-0.60
vanity
-0.58
challeng
-0.57
spr
-0.57
conqu
-0.56
POSITIVE LOGITS
m
1.39
ms
1.04
mk
1.04
pm
1.02
mx
0.98
mt
0.98
fm
0.97
meter
0.97
mid
0.96
dm
0.96
Activations Density 0.018%