INDEX
Explanations
occurrences of numerical quantities
mentions of numbers, particularly "four" and "five"
New Auto-Interp
Negative Logits
tch
-0.78
fw
-0.75
Dialogue
-0.65
Collider
-0.64
srfAttach
-0.63
Advertisement
-0.62
haps
-0.62
verty
-0.61
volt
-0.60
agine
-0.60
POSITIVE LOGITS
sides
0.98
phases
0.96
sexes
0.90
facets
0.90
ieth
0.88
genders
0.83
halves
0.80
quarters
0.79
branches
0.78
corners
0.77
Activations Density 0.029%