INDEX
Explanations
concepts or notions that are framed as "ideas" related to various topics
New Auto-Interp
Negative Logits
Neutral
-0.17
Builders
-0.15
endor
-0.14
Neutral
-0.14
ir
-0.14
Barry
-0.14
оÑĢе
-0.14
emp
-0.14
cg
-0.14
vit
-0.14
POSITIVE LOGITS
notion
0.25
idea
0.23
concept
0.23
idea
0.23
Idea
0.22
behind
0.22
notions
0.20
Concept
0.18
premise
0.17
concepts
0.17
Activations Density 0.034%