INDEX
Explanations
occurrences of the word "in"
New Auto-Interp
Negative Logits
owie
-0.18
flate
-0.16
lessly
-0.16
ify
-0.15
avers
-0.15
ctors
-0.15
Dana
-0.14
sted
-0.14
asted
-0.13
ξι
-0.13
POSITIVE LOGITS
depth
0.26
Depth
0.21
_depth
0.21
house
0.20
-depth
0.20
depth
0.20
Depth
0.19
situ
0.19
house
0.19
jokes
0.17
Activations Density 0.027%