INDEX
Explanations
phrases or terms known by other names
phrases that indicate alternative names or descriptions
New Auto-Interp
Negative Logits
fixation
-0.71
udic
-0.70
nexus
-0.63
avorite
-0.62
osion
-0.62
adobe
-0.61
ftime
-0.60
vette
-0.59
users
-0.59
efficiency
-0.58
POSITIVE LOGITS
gener
0.76
acronym
0.75
simply
0.73
literally
0.73
as
0.73
pseudonym
0.73
derog
0.72
collectively
0.72
abbrevi
0.72
abbre
0.71
Activations Density 0.054%