INDEX
Explanations
mentions of dolphins and related terms in various contexts
Dolphins, Marlins, Canadiens, Marseille
New Auto-Interp
Negative Logits
ắ
-0.47
Rag
-0.45
<h2>
-0.44
大
-0.43
pieces
-0.42
&
-0.41
Sty
-0.39
RAG
-0.38
Raw
-0.37
waste
-0.37
POSITIVE LOGITS
Dolphins
2.39
Dolphin
2.09
dolphins
1.98
Dolphin
1.97
olphins
1.96
dolphin
1.87
dolphin
1.61
🐬
1.20
Marlins
0.93
DOL
0.75
Activations Density 0.009%