INDEX
Explanations
phrases indicating potential risks or high-stakes scenarios
New Auto-Interp
Negative Logits
eor
-0.15
æijĺ
-0.15
.Interop
-0.15
lemek
-0.15
Demir
-0.14
ehen
-0.14
tallest
-0.14
/includes
-0.14
forder
-0.13
Khu
-0.13
POSITIVE LOGITS
future
0.21
future
0.16
upcoming
0.16
.future
0.15
gelecek
0.15
tomorrow
0.15
Future
0.14
Kramer
0.14
æľªæĿ¥
0.14
Carr
0.14
Activations Density 0.019%