INDEX
Explanations
references to personal information and its implications in security contexts
New Auto-Interp
Negative Logits
REALLY
-0.23
biraz
-0.22
trochu
-0.20
somewhat
-0.19
немного
-0.19
æľīçĤ¹
-0.17
SOME
-0.17
Hopefully
-0.16
maybe
-0.16
perhaps
-0.16
POSITIVE LOGITS
absolutely
0.36
exactly
0.31
literally
0.30
precisely
0.30
completely
0.29
perfectly
0.29
Absolutely
0.24
entirely
0.24
totally
0.23
actually
0.23
Activations Density 1.087%