INDEX
Explanations
sentences or phrases that express various ideas
New Auto-Interp
Negative Logits
quiv
-0.58
yam
-0.56
ç
-0.54
verständlich
-0.52
thering
-0.51
</em>
-0.50
s
-0.50
-0.49
тивы
-0.49
equili
-0.48
POSITIVE LOGITS
ideas
1.54
IDEA
1.54
Idea
1.49
Ideas
1.44
Ideas
1.44
Idea
1.42
ideas
1.39
IDEAS
1.28
IDEA
1.23
idea
1.22
Activations Density 0.056%