INDEX
Explanations
verbs expressing logical deductions or conclusions
phrases indicating the concept of making sense or logic
New Auto-Interp
Negative Logits
yx
-0.71
scl
-0.69
stra
-0.69
tar
-0.67
presided
-0.65
phrine
-0.65
ILCS
-0.63
rina
-0.63
anwhile
-0.63
thia
-0.63
POSITIVE LOGITS
sense
1.66
hift
1.11
Sense
1.04
sense
1.00
perfect
0.98
headlines
0.92
me
0.91
mockery
0.82
ENSE
0.79
intuitive
0.77
Activations Density 0.074%