INDEX
Explanations
mentions of being part of something
phrases that emphasize the concept of being a part of something
New Auto-Interp
Negative Logits
preceded
-0.75
ares
-0.72
Sections
-0.71
etimes
-0.69
atoon
-0.68
osponsors
-0.67
roads
-0.65
houses
-0.65
hips
-0.65
ences
-0.65
POSITIVE LOGITS
equation
1.40
puzzle
1.33
reason
1.14
problem
1.09
solution
1.08
rationale
0.96
bargain
0.96
charm
0.95
story
0.95
explanation
0.95
Activations Density 0.163%