INDEX
Explanations
phrases indicating an expected outcome or intention
New Auto-Interp
Negative Logits
Fine
-0.74
ILE
-0.74
Katy
-0.74
complicate
-0.71
Panic
-0.71
Indust
-0.70
Surprise
-0.69
Dism
-0.69
Stern
-0.68
Amateur
-0.68
POSITIVE LOGITS
wont
0.84
Ont
0.83
alian
0.81
elf
0.81
burgh
0.79
ptr
0.77
existed
0.75
iP
0.75
regener
0.73
liner
0.73
Activations Density 6.067%