INDEX
Explanations
phrases indicating exceptions or exclusions
phrases indicating exceptions or exclusions
New Auto-Interp
Negative Logits
largeDownload
-0.67
soType
-0.65
Constructed
-0.62
igure
-0.62
WATCHED
-0.61
ustration
-0.61
ONSORED
-0.60
rift
-0.60
figure
-0.58
embold
-0.57
POSITIVE LOGITS
ones
1.31
ours
1.17
yours
1.15
those
1.12
those
1.03
hers
1.03
theirs
0.94
Ones
0.82
mine
0.70
one
0.69
Activations Density 0.260%