INDEX
Explanations
phrases related to a specific subject or topic being questioned or discussed
New Auto-Interp
Negative Logits
landmark
-0.63
anniversary
-0.61
col
-0.58
}}
-0.57
coming
-0.56
Previously
-0.56
listed
-0.56
particularly
-0.55
suprem
-0.55
bra
-0.55
POSITIVE LOGITS
merely
1.26
simply
1.05
concentrate
1.02
purely
0.87
Instead
0.80
relying
0.80
foc
0.78
instead
0.75
solely
0.74
focus
0.74
Activations Density 2.573%