INDEX
Explanations
underlying abstract concepts
New Auto-Interp
Negative Logits
Sometimes
0.91
Sometimes
0.85
Although
0.78
Biasanya
0.77
Normally
0.76
因为
0.75
sometimes
0.75
sometimes
0.75
Being
0.74
因為
0.73
POSITIVE LOGITS
aspects
0.97
notions
0.93
elements
0.91
آنچه
0.90
underlying
0.89
perceptions
0.89
reality
0.88
społecz
0.86
beneath
0.85
salient
0.84
Activations Density 0.130%