INDEX
Explanations
references to "things" and "stuff" in various contexts
New Auto-Interp
Negative Logits
pleaſure
-1.10
myſelf
-1.09
fevere
-1.05
uſe
-1.05
juſ
-1.05
ſtre
-1.02
ſever
-1.02
juſt
-1.01
uſed
-1.01
ſet
-0.98
POSITIVE LOGITS
thing
2.07
things
1.97
Thing
1.88
THING
1.82
Things
1.75
Things
1.74
THINGS
1.74
Thing
1.67
things
1.56
THING
1.50
Activations Density 0.054%