INDEX
Explanations
references to numbers and lists
numerical rankings or order in lists
New Auto-Interp
Negative Logits
behavi
-0.81
authority
-0.76
tremend
-0.74
behav
-0.73
inward
-0.67
uggest
-0.67
izons
-0.65
moder
-0.65
rule
-0.65
heroine
-0.64
POSITIVE LOGITS
Learns
0.81
Bye
0.78
Provided
0.78
TBD
0.77
Scores
0.75
Unknown
0.71
59
0.71
YES
0.70
57
0.69
adem
0.69
Activations Density 0.117%