INDEX
Explanations
the word "because" indicating reasons or causes within the text
New Auto-Interp
Negative Logits
rai
-0.16
hani
-0.15
adel
-0.15
idan
-0.15
uj
-0.14
enders
-0.14
γÎŃν
-0.14
à¸Ĭà¸Ļ
-0.14
ucc
-0.14
rani
-0.14
POSITIVE LOGITS
of
0.48
cá»§a
0.30
of
0.23
à¸Ĥà¸Ńà¸ĩ
0.19
they
0.18
á»§a
0.18
_of
0.18
ÏĦηÏĤ
0.17
ÏĦÏīν
0.17
of
0.17
Activations Density 0.057%