INDEX
Explanations
phrases or words related to a specific individual or entity, likely named "Der" with varying activations
references to the term "Der" in various contexts
New Auto-Interp
Negative Logits
taco
-0.72
canoe
-0.66
TPS
-0.65
oka
-0.65
ãĥīãĥ©
-0.64
box
-0.64
ogg
-0.63
poon
-0.63
ping
-0.62
omn
-0.62
POSITIVE LOGITS
Der
4.05
Der
3.16
der
1.65
der
1.38
Derby
1.29
dermat
1.29
derivatives
1.25
deriv
1.06
Die
1.05
derivative
1.04
Activations Density 0.019%