INDEX
Explanations
references to teddy bears and related concepts
New Auto-Interp
Negative Logits
Nix
-0.38
xit
-0.36
cnx
-0.36
Nix
-0.35
cloudflare
-0.35
hukum
-0.35
Cronin
-0.35
Morfologia
-0.35
lichkeit
-0.34
limits
-0.33
POSITIVE LOGITS
teddy
1.87
Teddy
1.82
Teddy
1.70
teddy
1.54
toy
0.85
Toy
0.82
Toy
0.76
🧸
0.74
TOY
0.71
Toys
0.70
Activations Density 0.001%