INDEX
Explanations
phrases indicating comparison or contrast
references to the concept of reflection
New Auto-Interp
Negative Logits
ccess
-0.82
load
-0.73
ciating
-0.70
corn
-0.70
gging
-0.69
access
-0.65
duct
-0.65
Anonymous
-0.64
ça
-0.64
cham
-0.62
POSITIVE LOGITS
reflect
1.06
reflections
1.06
reflecting
0.99
reflects
0.96
reflection
0.94
reflected
0.93
reflect
0.91
Reflect
0.85
shards
0.75
ively
0.75
Activations Density 0.014%