INDEX
Explanations
phrases or words related to something being acceptable or not
concepts of acceptability and standards
New Auto-Interp
Negative Logits
hunt
-0.82
berry
-0.80
king
-0.79
dream
-0.78
hunter
-0.76
wan
-0.74
set
-0.73
hung
-0.73
GPU
-0.72
older
-0.72
POSITIVE LOGITS
acceptable
1.04
agre
1.00
undermin
0.82
soDeliveryDate
0.81
compromises
0.81
lihood
0.81
mosqu
0.80
srfAttach
0.79
ible
0.78
GoldMagikarp
0.78
Activations Density 0.007%