INDEX
Explanations
phrases related to abstract concepts
phrases that indicate comprehension or knowledge
New Auto-Interp
Negative Logits
onies
-0.83
nar
-0.70
ads
-0.68
sites
-0.63
drops
-0.61
adding
-0.61
Textures
-0.61
piling
-0.60
yss
-0.60
quer
-0.60
POSITIVE LOGITS
ually
0.94
Understanding
0.86
understanding
0.85
comprehension
0.80
ably
0.80
Understand
0.78
how
0.74
HOW
0.74
FUL
0.74
displayText
0.72
Activations Density 0.011%