INDEX
Explanations
non-specific ordinal terms denoting ranks or positions
instances of phrases or words indicating hierarchy or priority
New Auto-Interp
Negative Logits
xit
-0.50
occup
-0.44
ussia
-0.43
intercepted
-0.43
mobilization
-0.42
SAM
-0.42
conversion
-0.42
conversions
-0.42
prost
-0.42
unlaw
-0.41
POSITIVE LOGITS
hap
0.67
etheless
0.61
½
0.57
¤
0.56
oiler
0.56
onda
0.55
£
0.53
Nonetheless
0.53
arc
0.52
furthermore
0.52
Activations Density 0.804%