INDEX
Explanations
words related to household items and activities
lists and sequences of items or actions
New Auto-Interp
Negative Logits
hift
-0.67
orem
-0.66
keyes
-0.66
iciary
-0.65
kamp
-0.60
cedes
-0.59
cano
-0.57
tsky
-0.57
runs
-0.57
nels
-0.57
POSITIVE LOGITS
etc
1.71
etc
1.40
whatever
1.02
whatever
0.99
and
0.92
even
0.90
blah
0.87
anything
0.83
et
0.81
maybe
0.78
Activations Density 0.163%