INDEX
Explanations
words related to safety and suggestions for better practices
New Auto-Interp
Negative Logits
for
-0.31
длÑı
-0.23
untuk
-0.22
for
-0.21
για
-0.21
pentru
-0.20
für
-0.19
为
-0.18
voor
-0.18
for
-0.18
POSITIVE LOGITS
purposes
0.81
sake
0.79
reasons
0.42
purpose
0.41
PURPOSE
0.34
purpose
0.32
reason
0.31
pur
0.30
Purpose
0.30
æĿ¥è¯´
0.29
Activations Density 0.674%