INDEX
Explanations
terms related to grammatical roles and cases in language
New Auto-Interp
Negative Logits
itſelf
-0.94
Anſ
-0.90
iſt
-0.84
Jefus
-0.82
Majefty
-0.81
―――――
-0.81
raiſ
-0.80
myſelf
-0.80
ſelves
-0.79
ſche
-0.78
POSITIVE LOGITS
cardiaque
0.64
AndEndTag
0.63
colorés
0.62
épais
0.60
électriques
0.59
témoins
0.59
dieux
0.58
IntoConstraints
0.58
featureID
0.58
elettrico
0.57
Activations Density 0.124%