INDEX
Explanations
the presence of significant keywords or phrases, particularly at the start of sentences or sections
New Auto-Interp
Negative Logits
itſelf
-0.86
raiſ
-0.86
Anſ
-0.84
―――――
-0.80
uſed
-0.74
purpoſe
-0.74
poffible
-0.72
himſelf
-0.72
Kanna
-0.71
་་
-0.71
POSITIVE LOGITS
de
1.12
بوابة
0.97
indd
0.92
the
0.91
Σε
0.88
di
0.87
a
0.82
OF
0.79
'>
0.76
Hentet
0.75
Activations Density 0.021%