INDEX
Explanations
phrases related to small physical objects
articles or quantifiers in various contexts
New Auto-Interp
Negative Logits
Effects
-0.84
etsk
-0.82
iments
-0.80
ATURES
-0.77
arten
-0.76
izons
-0.76
words
-0.75
ItemTracker
-0.75
inson
-0.75
Examples
-0.73
POSITIVE LOGITS
single
0.87
nutshell
0.82
century
0.82
dozen
0.81
defunct
0.79
gigantic
0.78
typical
0.77
particular
0.76
larger
0.75
thousand
0.75
Activations Density 0.227%