INDEX
Explanations
references to safety concerns and product evaluations
New Auto-Interp
Negative Logits
Regards
-0.18
olla
-0.16
ýt
-0.15
Lyons
-0.15
avis
-0.14
erset
-0.14
halt
-0.14
MOTE
-0.14
isoft
-0.14
_metric
-0.14
POSITIVE LOGITS
tha
0.17
det
0.14
zens
0.14
ADVISED
0.14
spear
0.14
ãģĮãģĦ
0.13
ÙĬÙĬÙĨ
0.13
ENDED
0.13
Ķ
0.13
¯
0.13
Activations Density 0.012%