INDEX
Explanations
phrases related to responsibility and management
concepts related to authority, responsibility, and societal norms
New Auto-Interp
Negative Logits
Nich
-0.69
ãĥ¼ãĥĨ
-0.68
Universal
-0.65
Bride
-0.64
Repeat
-0.63
Vaugh
-0.62
Course
-0.62
Materials
-0.62
Mobil
-0.61
Western
-0.61
POSITIVE LOGITS
english
0.88
ername
0.72
dont
0.72
doesnt
0.71
tho
0.67
flair
0.67
didnt
0.67
sd
0.65
bud
0.64
gonna
0.64
Activations Density 0.356%