INDEX
Explanations
words related to interpersonal relationships and societal roles
New Auto-Interp
Negative Logits
-equipped
-0.17
edBy
-0.16
sharp
-0.16
kova
-0.15
osate
-0.15
urator
-0.14
amespace
-0.14
ovu
-0.14
erset
-0.14
ات
-0.14
POSITIVE LOGITS
ly
0.87
LY
0.59
liness
0.56
liest
0.47
lys
0.45
lier
0.41
lyph
0.40
lies
0.38
ely
0.36
hood
0.35
Activations Density 0.090%