INDEX
Explanations
phrases related to challenging authority or social norms
instances of the word "dare" in various contexts
New Auto-Interp
Negative Logits
urgy
-0.80
effic
-0.74
winner
-0.65
Rite
-0.63
Unity
-0.60
Methods
-0.60
Means
-0.60
iple
-0.60
entials
-0.59
ulators
-0.58
POSITIVE LOGITS
dare
1.08
Dare
0.95
ngth
0.91
daring
0.86
defy
0.82
dared
0.82
boldly
0.75
geon
0.72
penetrate
0.72
provoke
0.70
Activations Density 0.014%