INDEX
Explanations
societal structures and outcomes
New Auto-Interp
Negative Logits
você
0.94
간단
0.93
நீங்கள்
0.92
మీరు
0.91
tweaks
0.86
Você
0.84
你
0.83
ที่คุณ
0.82
youll
0.82
ඔබට
0.81
POSITIVE LOGITS
commensurate
1.15
thereby
1.04
undermines
1.01
sustainably
1.00
promotes
0.98
ociety
0.94
abroad
0.92
societies
0.89
threatens
0.89
harmed
0.88
Activations Density 0.200%