INDEX
Explanations
numeric values and their context within the text
New Auto-Interp
Negative Logits
ings
-0.18
redi
-0.16
ain
-0.15
ecom
-0.14
uario
-0.14
ãģ¨ãģĵãĤį
-0.14
orch
-0.14
Tits
-0.14
nik
-0.14
oid
-0.14
POSITIVE LOGITS
â̳
0.22
ìĦł
0.20
teenth
0.20
teen
0.20
th
0.18
WD
0.18
ãģĦãĤĭ
0.17
444
0.17
thane
0.17
â̲
0.16
Activations Density 0.178%