INDEX
Explanations
references to the term "gut" in various contexts
New Auto-Interp
Negative Logits
eur
-0.18
oding
-0.16
ishing
-0.16
hw
-0.16
oded
-0.16
ql
-0.15
231
-0.15
hai
-0.15
hop
-0.15
pole
-0.15
POSITIVE LOGITS
ierrez
0.31
ters
0.29
ted
0.28
gut
0.23
less
0.21
ting
0.20
tsy
0.19
ten
0.19
tae
0.18
achten
0.18
Activations Density 0.006%