INDEX
Explanations
the word "ate" in various verb forms
New Auto-Interp
Negative Logits
ness
-0.76
enegger
-0.75
nesses
-0.73
NESS
-0.73
ingen
-0.71
endon
-0.71
dale
-0.70
sung
-0.68
uggest
-0.68
stru
-0.67
POSITIVE LOGITS
chnology
1.18
rers
1.04
llular
1.00
rer
0.82
lli
0.82
anu
0.80
lled
0.80
xus
0.75
ctic
0.72
vich
0.71
Activations Density 0.080%