INDEX
Explanations
questions beginning with "Are" that inquire about specific conditions or situations
New Auto-Interp
Negative Logits
rr
-0.17
rai
-0.16
rig
-0.16
atk
-0.16
rad
-0.15
ruc
-0.15
ør
-0.15
aValue
-0.15
re
-0.15
lek
-0.14
POSITIVE LOGITS
ospace
0.21
zzo
0.20
tha
0.19
ady
0.18
nda
0.17
putation
0.16
olar
0.16
ogle
0.15
you
0.15
IGH
0.14
Activations Density 0.048%