INDEX
Explanations
references to quantities, amounts, or measurements in the text
Words/tokens followed by "recognized", "already", "that"
concepts and their associated actions
New Auto-Interp
Negative Logits
дописавши
-0.79
елның
-0.77
wrote
-0.66
حياتها
-0.65
ſche
-0.65
يتيمه
-0.64
Wrote
-0.64
Hiring
-0.63
Wearing
-0.63
Resistant
-0.63
POSITIVE LOGITS
put
0.75
being
0.71
set
0.61
that
0.60
taken
0.59
successfully
0.54
which
0.54
added
0.54
used
0.53
found
0.52
Activations Density 0.842%