INDEX
Explanations
references to the roles individuals or groups play within various contexts
New Auto-Interp
Negative Logits
å°ij女
-0.15
adera
-0.15
èĭĹ
-0.15
rani
-0.15
ugin
-0.14
á»ij
-0.14
ilor
-0.13
ople
-0.13
ivor
-0.13
659
-0.13
POSITIVE LOGITS
shaping
0.19
overall
0.19
Overall
0.17
iece
0.16
Overall
0.16
society
0.15
overall
0.15
llen
0.14
developments
0.14
relation
0.14
Activations Density 0.082%