INDEX
Explanations
references to figures or numeric values
New Auto-Interp
Negative Logits
rich
-0.17
lok
-0.16
ylon
-0.16
rien
-0.16
uga
-0.15
wich
-0.14
Moody
-0.14
mouth
-0.14
rne
-0.14
eties
-0.14
POSITIVE LOGITS
head
0.19
.fig
0.19
tte
0.17
heads
0.17
inth
0.16
RED
0.15
headed
0.15
uration
0.15
oft
0.15
prominently
0.14
Activations Density 0.038%