INDEX
Explanations
names of creators and companies
phrases that indicate authorship or creation
New Auto-Interp
Negative Logits
itiveness
-0.84
itta
-0.83
aser
-0.79
vous
-0.78
uality
-0.77
asy
-0.76
imir
-0.74
Balt
-0.71
clips
-0.71
iasm
-0.70
POSITIVE LOGITS
virtue
1.12
laws
0.84
products
0.82
default
0.78
means
0.74
Wizards
0.74
statute
0.73
STATS
0.70
VID
0.70
mistake
0.69
Activations Density 0.139%