INDEX
Explanations
numerical values denoting certainty or probability, particularly when expressed as percentages
phrases indicating certainty or quantifiable metrics, especially those expressing percentages
New Auto-Interp
Negative Logits
rift
-0.78
vati
-0.73
rang
-0.72
colm
-0.71
olas
-0.70
isen
-0.70
netflix
-0.69
achi
-0.69
akings
-0.68
attled
-0.67
POSITIVE LOGITS
00000
1.21
%"
1.03
%]
0.96
0000
0.95
%;
0.90
=#
0.89
0000000
0.89
Percent
0.86
000000
0.85
000
0.85
Activations Density 0.027%