INDEX
Explanations
themes of abstract concepts
New Auto-Interp
Negative Logits
an
0.56
ed
0.50
assembl
0.49
uoso
0.46
ار
0.45
onato
0.45
u
0.45
athe
0.45
view
0.45
ஒரு
0.44
POSITIVE LOGITS
socialism
0.51
violência
0.48
Socialism
0.45
depresión
0.44
aggression
0.44
oor
0.44
whakam
0.44
putem
0.44
censure
0.43
fascism
0.43
Activations Density 0.084%