INDEX
Explanations
function and method declarations or calls within code
New Auto-Interp
Negative Logits
u
-0.62
boxy
-0.61
-
-0.59
一
-0.58
&
-0.58
o
-0.56
int
-0.56
s
-0.55
t
-0.54
:
-0.54
POSITIVE LOGITS
myſelf
1.20
himſelf
1.17
ſelf
1.14
themſelves
1.13
pleaſure
1.12
itſelf
1.09
ſelves
1.07
BibitemShut
0.99
leſs
0.98
Theſe
0.97
Activations Density 0.043%