INDEX
Explanations
terms associated with mild conditions or effects
New Auto-Interp
Negative Logits
^(@)
-1.30
purpoſe
-1.29
ſtate
-1.28
itſelf
-1.28
Personendaten
-1.27
houſe
-1.26
ſelves
-1.21
myſelf
-1.20
whoſe
-1.18
ſelf
-1.18
POSITIVE LOGITS
*(
0.96
0.79
*
0.67
↵↵
0.65
'
0.65
**
0.64
frac
0.63
=
0.59
.
0.58
and
0.58
Activations Density 0.624%