INDEX
Explanations
pronouns and phrases expressing personal experience or observations
New Auto-Interp
Negative Logits
ãĥĥãĤ«ãĥ¼
-0.18
iland
-0.15
ilion
-0.15
.scalablytyped
-0.15
veloper
-0.15
اذا
-0.14
WND
-0.14
riad
-0.14
>[]
-0.14
ิว
-0.14
POSITIVE LOGITS
despite
0.23
finally
0.20
although
0.20
besides
0.18
final
0.18
Finally
0.17
aside
0.17
finally
0.17
after
0.17
while
0.17
Activations Density 0.008%