INDEX
Explanations
references to individuals or groups in a context of care or assistance
New Auto-Interp
Negative Logits
276
-0.16
("(%-0.14
enos
-0.14
åĵ
-0.14
ż
-0.14
Boone
-0.14
ĶåĽŀ
-0.13
gee
-0.13
ajs
-0.13
adic
-0.13
POSITIVE LOGITS
AGMA
0.19
eil
0.16
ãĥ³ãĥIJãĥ¼
0.15
któ
0.14
eyin
0.14
qui
0.14
razier
0.14
unde
0.14
PHA
0.14
vla
0.14
Activations Density 0.025%