INDEX
Explanations
terms related to specific criteria being met
references to specific criteria or standards
New Auto-Interp
Negative Logits
orld
-0.79
joy
-0.73
resent
-0.68
ership
-0.67
ston
-0.66
irth
-0.65
lar
-0.63
lique
-0.63
hand
-0.62
leases
-0.62
POSITIVE LOGITS
criteria
1.32
criterion
1.01
erion
1.00
witz
0.81
thresholds
0.78
DragonMagazine
0.77
definitions
0.77
reviewers
0.77
cutoff
0.75
dictates
0.73
Activations Density 0.015%