INDEX
Explanations
phrases related to small examples or elements representing a larger concept
references to significant issues or underlying problems
New Auto-Interp
Negative Logits
chev
-0.82
iors
-0.80
unctions
-0.75
Daily
-0.73
ancies
-0.72
OWN
-0.72
LY
-0.71
ummies
-0.69
lords
-0.67
cler
-0.66
POSITIVE LOGITS
iceberg
1.39
scale
0.87
spear
0.82
scales
0.82
rope
0.78
proverbial
0.76
wedge
0.74
finger
0.74
mustard
0.72
fingers
0.67
Activations Density 0.096%