INDEX
Explanations
articles and determiners in various languages
New Auto-Interp
Negative Logits
unul
-0.78
بيها
-0.67
suivante
-0.67
figliu
-0.66
Majefty
-0.65
Jefus
-0.63
OFDb
-0.63
متعلقه
-0.63
favoritas
-0.61
raiſ
-0.61
POSITIVE LOGITS
very
0.92
certain
0.82
kind
0.82
considerable
0.82
few
0.82
great
0.79
sort
0.76
cuantos
0.76
large
0.76
separate
0.73
Activations Density 0.014%