INDEX
Explanations
phrases indicating importance or relevance
references to unspecified or general concepts
New Auto-Interp
Negative Logits
DOM
-0.74
sels
-0.68
oÄŁ
-0.64
DOS
-0.64
nor
-0.63
Ship
-0.63
ean
-0.62
inders
-0.62
mans
-0.62
mast
-0.62
POSITIVE LOGITS
Else
1.08
else
1.03
intangible
0.84
akin
0.79
worthwhile
0.75
ĪĴ
0.74
disruptive
0.74
contagious
0.73
Else
0.73
ioned
0.73
Activations Density 0.036%