INDEX
Explanations
references to societal roles and expectations, particularly regarding behavior and identity
New Auto-Interp
Negative Logits
affichés
-0.57
vPvB
-0.51
augmenté
-0.50
∬
-0.49
tensive
-0.48
Tbh
-0.48
rencontré
-0.48
fiquei
-0.48
éroport
-0.47
pled
-0.47
POSITIVE LOGITS
life
1.19
humanity
1.05
nature
0.99
civilization
0.91
mankind
0.90
civilisation
0.89
society
0.86
humankind
0.85
everything
0.84
fate
0.81
Activations Density 0.845%