INDEX
Explanations
social structures and norms
New Auto-Interp
Negative Logits
snappy
0.40
potongan
0.38
tastefully
0.38
yrch
0.37
subtraction
0.36
潜力
0.36
感受
0.36
احساس
0.36
toner
0.36
toning
0.36
POSITIVE LOGITS
society
1.41
structures
1.34
norms
1.34
institutions
1.30
society
1.24
societal
1.23
общества
1.19
societies
1.17
sociedade
1.14
Structures
1.13
Activations Density 0.063%