INDEX
Explanations
references to themes of existential struggle and moral ambiguity
New Auto-Interp
Negative Logits
antha
-0.08
ãĥ³ãĥĩ
-0.07
ayment
-0.07
δα
-0.07
bilt
-0.07
lech
-0.07
uity
-0.07
irma
-0.07
/Instruction
-0.07
sexle
-0.07
POSITIVE LOGITS
Italian
0.15
Italy
0.14
Italian
0.12
Italy
0.11
italian
0.10
Italians
0.10
Gi
0.10
Aless
0.10
azz
0.10
Francesco
0.10
Activations Density 0.605%