INDEX
Explanations
thoughts and beliefs expressed by individuals
expressions of belief or opinion
New Auto-Interp
Negative Logits
clock
-0.79
info
-0.74
inary
-0.70
aration
-0.69
rendered
-0.67
til
-0.64
css
-0.64
abi
-0.62
wa
-0.62
alia
-0.61
POSITIVE LOGITS
himself
0.70
positives
0.67
olate
0.62
phas
0.62
herself
0.61
olated
0.60
passionately
0.60
nostalgia
0.60
sclerosis
0.59
optimism
0.59
Activations Density 0.267%