INDEX
Explanations
numerical quantities related to people
mentions of numbers, particularly when they refer to quantities of entities or characters
New Auto-Interp
Negative Logits
obin
-0.84
urden
-0.81
vention
-0.74
needs
-0.71
arist
-0.71
utable
-0.70
ook
-0.68
Redditor
-0.68
Rating
-0.68
doesn
-0.67
POSITIVE LOGITS
teen
1.09
teenth
1.05
dozen
1.02
unidentified
0.94
hundred
0.93
weeks
0.93
months
0.93
unnamed
0.93
consecutive
0.93
thirds
0.92
Activations Density 0.181%