INDEX
Explanations
adjectives or nouns indicating importance, supremacy, or priority
terms that indicate primary or dominant roles and influences
New Auto-Interp
Negative Logits
oops
-0.85
erved
-0.81
zanne
-0.79
earances
-0.76
ancies
-0.73
ravings
-0.72
undreds
-0.72
atri
-0.71
ires
-0.69
IRE
-0.68
POSITIVE LOGITS
beneficiary
1.19
conduit
1.11
culprit
1.06
source
1.03
obstacle
0.99
indicator
0.98
casualty
0.97
catalyst
0.97
contender
0.94
motiv
0.93
Activations Density 0.171%