INDEX
Explanations
phrases indicating the presence or absence of certain conditions or situations
phrases indicating the presence or absence of situations or conditions
New Auto-Interp
Negative Logits
Sins
-0.78
stars
-0.75
Seasons
-0.74
Industries
-0.73
jas
-0.72
charms
-0.71
olor
-0.70
isms
-0.69
Characters
-0.69
isites
-0.68
POSITIVE LOGITS
insufficient
1.11
no
1.10
disagreement
1.05
ample
1.05
nothing
0.91
sufficient
0.91
evidence
0.89
overlap
0.88
speculation
0.88
uncertainty
0.87
Activations Density 0.135%