INDEX
Explanations
phrases indicating ongoing or repeated actions
New Auto-Interp
Negative Logits
assi
-0.17
tn
-0.16
terminal
-0.14
orb
-0.14
oreach
-0.13
ryn
-0.13
top
-0.13
traction
-0.13
ely
-0.13
roph
-0.13
POSITIVE LOGITS
lename
0.15
aight
0.15
stylesheet
0.14
U
0.14
ugo
0.14
Sund
0.14
Wilkinson
0.14
Conv
0.14
CLA
0.14
49
0.13
Activations Density 0.197%