INDEX
Explanations
references to activities, places, or opportunities related to making decisions and taking action
New Auto-Interp
Negative Logits
ogr
-0.14
alta
-0.14
éIJ
-0.14
iná
-0.14
бо
-0.13
elt
-0.12
ãĤ¢ãĥ³
-0.12
aldo
-0.12
adh
-0.12
nown
-0.12
POSITIVE LOGITS
to
0.64
to
0.36
_to
0.34
to
0.32
να
0.32
zu
0.30
To
0.28
-to
0.27
Äijá»ĥ
0.27
ãĤĴ
0.26
Activations Density 0.263%