INDEX
Explanations
hiking with other activities
New Auto-Interp
Negative Logits
지
1.48
۰
1.19
اد
1.05
sk
1.02
ной
1.02
f
1.01
that
0.99
ні
0.99
่า
0.98
که
0.97
POSITIVE LOGITS
ه
1.30
a
1.17
hikers
1.11
,
1.06
hikes
0.98
hiking
0.97
Hiking
0.95
on
0.95
hiked
0.93
,’
0.92
Activations Density 0.003%