INDEX
Explanations
positive attributes or qualities
various conjunctions and phrases that qualify or describe characteristics
New Auto-Interp
Negative Logits
onica
-0.88
atives
-0.78
prise
-0.75
udence
-0.74
mma
-0.74
olicy
-0.72
inar
-0.71
acas
-0.71
Else
-0.71
ctuary
-0.71
POSITIVE LOGITS
furthermore
1.11
moreover
1.09
secondly
1.07
boasts
1.03
contains
1.03
possesses
1.00
consequently
1.00
therefore
0.99
ooz
0.97
hence
0.96
Activations Density 0.377%