INDEX
Explanations
phrases indicating causality or contribution
phrases that discuss causality and relationships between elements
New Auto-Interp
Negative Logits
apse
-0.76
aby
-0.70
lege
-0.69
VI
-0.69
scribe
-0.64
must
-0.64
anything
-0.63
æ³
-0.63
aii
-0.62
apolis
-0.60
POSITIVE LOGITS
demographics
0.78
anecd
0.74
Flavoring
0.71
sheer
0.68
misunder
0.68
reluctance
0.67
inertia
0.66
awareness
0.64
implicit
0.64
underest
0.63
Activations Density 0.144%