INDEX
Explanations
personal experiences or reflections
New Auto-Interp
Negative Logits
lation
-0.75
geon
-0.73
mund
-0.69
ardless
-0.67
math
-0.64
Dom
-0.62
srfAttach
-0.62
reality
-0.61
plant
-0.60
Wem
-0.60
POSITIVE LOGITS
learnt
0.99
learned
0.98
wish
0.98
Learned
0.90
bucket
0.84
wanted
0.81
noticed
0.80
wished
0.79
dislike
0.79
forgot
0.77
Activations Density 0.111%