INDEX
Explanations
phrases related to increasingly negative situations or developments
negative outcomes and deteriorating situations
New Auto-Interp
Negative Logits
tein
-0.79
ools
-0.72
iciency
-0.70
obook
-0.65
icrobial
-0.65
rity
-0.63
arlane
-0.63
gemony
-0.60
phies
-0.60
rahim
-0.59
POSITIVE LOGITS
roy
0.82
backstage
0.79
¯
0.75
./
0.75
downhill
0.73
ansky
0.72
for
0.72
diplom
0.71
brewing
0.67
Kenobi
0.66
Activations Density 0.181%