INDEX
Explanations
phrases indicating the start or beginning of something
instances of beginnings or initial phases of topics or events
New Auto-Interp
Negative Logits
ult
-0.75
inance
-0.71
acha
-0.70
sleep
-0.69
atography
-0.68
ophon
-0.68
luaj
-0.67
elo
-0.66
leigh
-0.65
isl
-0.64
POSITIVE LOGITS
-->
0.80
scratching
0.78
scratches
0.76
frontier
0.72
collateral
0.71
sympt
0.71
stim
0.66
scratched
0.65
consolation
0.65
Gate
0.64
Activations Density 0.162%