INDEX
Explanations
command preceded by word or token
New Auto-Interp
Negative Logits
is
0.89
h
0.87
j
0.82
yb
0.80
sale
0.77
ד
0.77
g
0.73
hh
0.73
hg
0.72
hhhh
0.71
POSITIVE LOGITS
Қ
1.05
ে
1.02
ε
1.00
いますが
0.97
Cadastro
0.95
ી
0.95
混凝土
0.94
நிலை
0.93
EnglishChoose
0.93
Muscle
0.93
Activations Density 0.004%