INDEX
Explanations
proper nouns or specific entities, probably related to a type of classification or identification
the term "Other" in various contexts
New Auto-Interp
Negative Logits
ardo
-0.64
fres
-0.63
´
-0.62
usted
-0.60
daring
-0.58
cour
-0.56
itude
-0.56
lif
-0.54
nearly
-0.54
zos
-0.54
POSITIVE LOGITS
Other
3.37
Others
2.59
Other
2.37
Others
1.99
OTHER
1.96
other
1.72
Another
1.59
other
1.54
Various
1.49
OTHER
1.34
Activations Density 0.007%