INDEX
Explanations
expressions related to social interactions and leisure activities
New Auto-Interp
Negative Logits
parency
-0.57
省市镇
-0.56
orsese
-0.55
TestBed
-0.53
Nuorodos
-0.52
colline
-0.52
astie
-0.51
concret
-0.51
Kase
-0.49
lapsingToolbar
-0.49
POSITIVE LOGITS
hang
1.28
Hang
1.26
HANG
1.21
hung
1.16
hanging
1.14
hangs
1.08
Hang
1.08
Hanging
1.07
Hanging
1.06
hanging
1.05
Activations Density 0.054%