INDEX
Explanations
phrases describing alignment, matching, or coincidence between different entities or concepts
phrases related to correspondence or alignment between concepts or events
New Auto-Interp
Negative Logits
chat
-0.60
zan
-0.59
æµ
-0.59
ËĪ
-0.59
ond
-0.58
worms
-0.57
hunted
-0.56
recomm
-0.56
beware
-0.56
abouts
-0.55
POSITIVE LOGITS
nicely
1.11
neatly
1.02
closely
0.92
perfectly
0.90
sharply
0.86
squarely
0.80
favorably
0.78
stark
0.76
Against
0.74
poorly
0.72
Activations Density 0.187%