INDEX
Explanations
words related to influence, impact, or change
terms related to the concept of shaping or influencing various aspects of society or personal experiences
New Auto-Interp
Negative Logits
unts
-0.72
Mub
-0.65
ciation
-0.64
ysis
-0.60
employed
-0.59
scrape
-0.58
istg
-0.58
blers
-0.57
typew
-0.57
itars
-0.56
POSITIVE LOGITS
perceptions
0.91
glers
0.87
shaping
0.78
destiny
0.75
ãĥį
0.71
attitudes
0.71
traject
0.70
eering
0.69
into
0.69
insula
0.69
Activations Density 0.034%