INDEX
Explanations
verbs related to actions in research or experimental contexts
New Auto-Interp
Negative Logits
rör
-0.69
houſe
-0.67
ſmall
-0.66
ModelExpression
-0.66
getIs
-0.66
Houſe
-0.63
nahilalakip
-0.61
متعلقه
-0.61
miſ
-0.61
cauſe
-0.60
POSITIVE LOGITS
in
0.86
via
0.72
by
0.71
with
0.70
on
0.69
at
0.69
through
0.66
between
0.60
as
0.59
only
0.58
Activations Density 0.739%