INDEX
Explanations
phrases indicating a similar sentiment or action across different contexts
New Auto-Interp
Negative Logits
bang
-0.69
contrace
-0.67
vironment
-0.62
routines
-0.61
curry
-0.60
surrogate
-0.60
choir
-0.56
synthes
-0.56
channels
-0.56
covenant
-0.55
POSITIVE LOGITS
IELD
0.82
ilton
0.79
Berm
0.73
ault
0.70
blance
0.69
":""},{"0.68
^^^^
0.68
ör
0.67
sburgh
0.67
});
0.67
Activations Density 0.128%