INDEX
Explanations
negative descriptors and references to damage or problematic situations
New Auto-Interp
Negative Logits
RIPT
-0.17
aney
-0.16
kud
-0.15
ugi
-0.14
ipa
-0.14
iang
-0.14
oret
-0.13
aneous
-0.13
ernels
-0.13
ildren
-0.13
POSITIVE LOGITS
Tir
0.17
Mos
0.15
,
0.15
joint
0.14
.activ
0.14
aes
0.13
cler
0.13
oling
0.13
op
0.13
oki
0.13
Activations Density 0.000%