INDEX
Explanations
positive experiences and descriptions of challenging situations
New Auto-Interp
Negative Logits
.assertThat
-0.15
ož
-0.15
only
-0.15
Wake
-0.14
orig
-0.14
agem
-0.14
INF
-0.14
esco
-0.14
lando
-0.14
tz
-0.14
POSITIVE LOGITS
pregnancy
0.17
Pregnancy
0.16
esse
0.15
iant
0.15
εί
0.14
kaç
0.14
opsy
0.14
CHANT
0.14
anine
0.14
ess
0.14
Activations Density 0.182%