INDEX
Explanations
references to self-discovery and personal fulfillment
New Auto-Interp
Negative Logits
orman
-0.19
happen
-0.16
ÃŃk
-0.15
uke
-0.15
Influence
-0.15
/Base
-0.14
ument
-0.14
ough
-0.14
loses
-0.14
chos
-0.14
POSITIVE LOGITS
suits
0.22
matters
0.22
works
0.20
exc
0.20
suites
0.19
moves
0.18
lights
0.18
Matters
0.18
interests
0.17
works
0.17
Activations Density 0.088%