INDEX
Explanations
mentions of personal experiences or reflections
New Auto-Interp
Negative Logits
tright
-0.16
scr
-0.15
ÑıÑħ
-0.14
eyer
-0.14
scaleX
-0.14
ÑĢеÑħ
-0.13
.targets
-0.13
Sounds
-0.13
imens
-0.13
fp
-0.13
POSITIVE LOGITS
thought
0.69
wanted
0.61
thought
0.60
Thought
0.59
Thought
0.54
wanted
0.52
Wanted
0.46
decided
0.36
figured
0.31
wondered
0.30
Activations Density 0.172%