INDEX
Explanations
interactions and contrasts within social contexts
New Auto-Interp
Negative Logits
ibar
-0.13
OfClass
-0.13
contri
-0.13
alker
-0.13
биÑĤ
-0.13
TimeStamp
-0.13
conseils
-0.13
lÃŃ
-0.12
ibile
-0.12
ãĥ³ãĤ¿
-0.12
POSITIVE LOGITS
(er
0.23
proverb
0.23
indeed
0.21
(s
0.20
thee
0.20
______
0.20
(
0.19
(es
0.19
_____
0.19
___
0.19
Activations Density 0.167%