INDEX
Explanations
syntax elements related to function definitions and annotations in code
New Auto-Interp
Negative Logits
w
-0.65
er
-0.65
ing
-0.63
hu
-0.63
z
-0.62
w
-0.61
damn
-0.59
kh
-0.58
ed
-0.57
hu
-0.57
POSITIVE LOGITS
]")]
1.82
__":
1.71
}")]
1.57
__':
1.48
')")
1.44
$")
1.39
.")]
1.35
}))
1.34
}))
1.34
]$}
1.33
Activations Density 0.035%