INDEX
Explanations
phrases related to a direction or reduction, such as the word "down" at various levels of intensity
New Auto-Interp
Negative Logits
archives
-0.83
achu
-0.70
itia
-0.66
icles
-0.64
andan
-0.64
itive
-0.63
rament
-0.63
¶ħ
-0.62
ificent
-0.61
andise
-0.61
POSITIVE LOGITS
LOAD
1.20
graded
1.14
stairs
1.01
grading
0.95
hill
0.90
pour
0.82
loaded
0.81
stairs
0.81
played
0.79
grades
0.78
Activations Density 2.275%