INDEX
Explanations
instances of names or significant identifiers in a context, particularly those that indicate prominence or importance
New Auto-Interp
Negative Logits
ãĥ£
-0.17
istik
-0.16
ازÙĩ
-0.16
allegedly
-0.15
another
-0.14
xs
-0.14
ones
-0.14
ie
-0.13
arda
-0.13
OOK
-0.13
POSITIVE LOGITS
which
0.32
which
0.26
whose
0.26
Which
0.25
Which
0.23
.which
0.21
cui
0.21
WHICH
0.21
whose
0.20
who
0.19
Activations Density 0.042%