INDEX
Explanations
words associated with challenging authority or societal norms
instances of the word "dare" and its variations, often indicating challenges or confrontations
New Auto-Interp
Negative Logits
urgy
-0.74
ulator
-0.70
ulators
-0.69
effic
-0.68
Rite
-0.67
iple
-0.65
utra
-0.65
ulatory
-0.64
OTOS
-0.62
Tool
-0.62
POSITIVE LOGITS
dare
1.02
Dare
0.96
daring
0.89
defy
0.86
ngth
0.83
evil
0.83
dared
0.81
roam
0.80
boldly
0.75
provoke
0.72
Activations Density 0.026%