INDEX
Explanations
details related to specific examples or instances within larger categories or contexts
phrases that list examples or instances of items or subjects
New Auto-Interp
Negative Logits
someone
-0.66
rec
-0.63
college
-0.61
mom
-0.61
daughter
-0.61
wheels
-0.60
states
-0.60
version
-0.60
rack
-0.60
wheel
-0.60
POSITIVE LOGITS
including
3.41
excluding
2.16
except
1.97
Including
1.88
especially
1.86
particularly
1.79
includes
1.76
whether
1.60
such
1.60
both
1.56
Activations Density 0.014%