INDEX
Explanations
list, other, remark, exercise
New Auto-Interp
Negative Logits
(
1.05
,
0.97
reveals
0.95
wrapped
0.92
humanoid
0.91
("0.90
Equipped
0.90
ricted
0.90
seeming
0.89
romatic
0.88
POSITIVE LOGITS
други
1.45
другие
1.35
排序
1.25
Cadastro
1.24
остальные
1.22
інші
1.21
سایر
1.20
备注
1.17
Ejercicio
1.16
-]+
1.15
Activations Density 0.001%