INDEX
Explanations
terms related to social concepts and interactions
New Auto-Interp
Negative Logits
aler
-0.16
inness
-0.15
idis
-0.14
annies
-0.14
aligned
-0.14
ulla
-0.14
gom
-0.14
áÅĻ
-0.14
ptune
-0.14
bsp
-0.14
POSITIVE LOGITS
izing
0.28
ization
0.28
ize
0.25
distancing
0.24
ite
0.24
media
0.24
ized
0.23
justice
0.23
ising
0.22
-media
0.21
Activations Density 0.031%