INDEX
Explanations
referential pronouns and demonstratives
demonstrative pronouns/determiners across languages
New Auto-Interp
Negative Logits
Ause
-0.59
kasarigan
-0.54
HERO
-0.50
hors
-0.50
mouseClicked
-0.50
Scaling
-0.49
Installer
-0.48
nungszeiten
-0.48
-------
-0.48
ticides
-0.47
POSITIVE LOGITS
THAT
0.72
That
0.72
That
0.69
same
0.67
that
0.63
THAT
0.62
aquello
0.60
Celui
0.59
того
0.57
том
0.56
Activations Density 0.005%