INDEX
Explanations
the name "Toby" with varying activations
references to specific names or terms associated with the character "Toby" and related variations
New Auto-Interp
Negative Logits
ional
-0.81
krit
-0.71
igent
-0.71
iment
-0.69
ŃĶ
-0.66
los
-0.66
hips
-0.66
ogical
-0.65
merit
-0.64
ioned
-0.63
POSITIVE LOGITS
oby
1.12
ellow
0.82
BY
0.81
ãĥį
0.77
ota
0.77
dump
0.75
pload
0.74
ãĥ³ãĤ¸
0.73
ote
0.72
utsu
0.72
Activations Density 0.010%