INDEX
Explanations
phrases that indicate cause and effect relationships
New Auto-Interp
Negative Logits
.ta
-0.14
ابط
-0.13
yles
-0.13
ساÙĨÛĮ
-0.13
impan
-0.13
pora
-0.13
urtle
-0.13
buat
-0.13
Opportunities
-0.13
uden
-0.12
POSITIVE LOGITS
result
0.64
consequence
0.50
result
0.49
product
0.45
Result
0.43
.result
0.43
-result
0.42
RESULT
0.41
Result
0.39
(result
0.39
Activations Density 0.159%