INDEX
Explanations
references to comparisons and relationships between entities or concepts
New Auto-Interp
Negative Logits
Been
-0.15
eyin
-0.15
ynos
-0.14
radient
-0.14
oi
-0.13
iets
-0.13
ãģ£ãģį
-0.13
átis
-0.13
ØŃØ«
-0.13
øy
-0.13
POSITIVE LOGITS
does
0.89
did
0.88
does
0.73
do
0.71
did
0.68
Does
0.66
Did
0.58
Did
0.56
Does
0.56
DOES
0.55
Activations Density 0.195%