INDEX
Explanations
function calls and initializations in code
New Auto-Interp
Negative Logits
'
-0.56
H
-0.53
corre
-0.53
O
-0.51
c
-0.51
‘
-0.51
"
-0.50
ot
-0.50
l
-0.49
h
-0.49
POSITIVE LOGITS
pleaſure
1.08
ſtate
0.93
Efq
0.93
myſelf
0.92
Geplaatst
0.92
houſe
0.90
purpoſe
0.89
uſe
0.86
Tikang
0.86
fubject
0.84
Activations Density 0.054%