INDEX
Explanations
phrases summarizing information
New Auto-Interp
Negative Logits
bees
-0.81
duct
-0.78
charism
-0.77
gypt
-0.75
Wee
-0.73
eno
-0.69
enne
-0.68
train
-0.68
Sea
-0.67
ghost
-0.64
POSITIVE LOGITS
summ
0.88
summarize
0.82
summar
0.80
summary
0.79
lations
0.77
thereof
0.72
itious
0.72
overview
0.71
summarizes
0.71
panel
0.71
Activations Density 0.118%