INDEX
Explanations
phrases indicating the introduction of a new or additional concept or subject
phrases indicating repeated instances or patterns of behavior or events
New Auto-Interp
Negative Logits
isters
-0.78
alties
-0.77
hips
-0.73
lee
-0.72
kee
-0.71
liest
-0.69
=-=-=-=-
-0.68
Always
-0.68
oes
-0.67
Hos
-0.65
POSITIVE LOGITS
worldly
1.06
notch
0.85
contender
0.79
wcs
0.78
avenue
0.77
natureconservancy
0.76
dimension
0.76
installment
0.76
judicial
0.76
Flavoring
0.74
Activations Density 0.038%