INDEX
Explanations
social and cognitive concepts
New Auto-Interp
Negative Logits
및
0.48
–
0.46
&
0.40
connections
0.38
results
0.37
implications
0.37
aspects
0.36
Bella
0.36
informações
0.36
assorted
0.36
POSITIVE LOGITS
hegemony
0.73
tyranny
0.66
centric
0.59
supremacy
0.57
ophagy
0.56
communism
0.55
সর্ব
0.55
ogenesis
0.54
fallacy
0.54
fascism
0.54
Activations Density 0.080%