INDEX
Explanations
specific programming language-related keywords or function calls
New Auto-Interp
Negative Logits
themſelves
-0.88
iſt
-0.84
eſſ
-0.80
tranſ
-0.80
Anſ
-0.79
ſeveral
-0.78
paſſ
-0.77
Theſe
-0.76
Eſ
-0.75
neceſſ
-0.75
POSITIVE LOGITS
m
2.66
m
2.31
M
1.74
M
1.49
м
1.44
getM
1.28
getM
1.27
m
1.25
mR
1.17
م
1.14
Activations Density 0.152%