INDEX
Explanations
instances of the word "overhe" (and variations involving "he" and "rehe")
New Auto-Interp
Negative Logits
n
-0.25
w
-0.20
l
-0.19
ss
-0.18
y
-0.18
la
-0.17
sg
-0.17
elop
-0.17
d
-0.17
nin
-0.17
POSITIVE LOGITS
aring
0.25
oric
0.24
ated
0.24
uristic
0.23
ating
0.23
arts
0.22
ctic
0.22
aven
0.21
arsed
0.21
aling
0.20
Activations Density 0.015%