INDEX
Explanations
instances of high activation values, indicating significant emphasis or importance in the text
New Auto-Interp
Negative Logits
agne
-0.18
uyá»ĩn
-0.15
adera
-0.14
ãĥ³ãĤº
-0.13
%%%%%%%%
-0.13
#ad
-0.13
ausal
-0.13
ubb
-0.13
urry
-0.13
sic
-0.12
POSITIVE LOGITS
untu
0.16
ForEach
0.14
à¥įवव
0.14
аÑĢÑħ
0.14
aurus
0.14
foon
0.14
uptools
0.14
809
0.14
_ctor
0.14
bih
0.13
Activations Density 0.035%