INDEX
Explanations
numerical quantities or words related to quantification
New Auto-Interp
Negative Logits
abs
-0.77
tops
-0.75
rovers
-0.73
haven
-0.72
dn
-0.69
duc
-0.65
stones
-0.65
bows
-0.65
models
-0.64
children
-0.64
POSITIVE LOGITS
successive
1.17
iteration
0.96
participant
0.92
individual
0.88
person
0.86
where
0.84
month
0.84
piece
0.81
element
0.80
member
0.80
Activations Density 0.034%