INDEX
Explanations
instances of the word "take" and its variations
New Auto-Interp
Negative Logits
ampa
-0.07
upe
-0.06
rike
-0.06
strup
-0.06
apur
-0.06
éłħ
-0.06
rem
-0.06
anton
-0.06
ially
-0.05
.coord
-0.05
POSITIVE LOGITS
shape
0.17
shape
0.14
root
0.14
Shape
0.14
Shape
0.12
hold
0.12
shapes
0.12
hape
0.11
root
0.11
_shape
0.10
Activations Density 0.016%