INDEX
Explanations
repeated instances of the word "the" and other terms indicating quantification or significance
New Auto-Interp
Negative Logits
dries
-0.81
rubs
-0.69
cools
-0.68
囗
-0.66
softens
-0.65
parteci
-0.63
breathes
-0.63
resonates
-0.58
melts
-0.58
rots
-0.58
POSITIVE LOGITS
use
0.81
increase
0.77
sự
0.73
việc
0.70
ệc
0.70
ajiban
0.65
simultaneous
0.61
การ
0.60
study
0.60
'\\;'
0.59
Activations Density 0.932%