INDEX
Explanations
abstract concepts and states
New Auto-Interp
Negative Logits
های
0.64
an
0.63
foolproof
0.62
时间和
0.61
ционный
0.60
a
0.57
+)$
0.55
juicy
0.55
َب
0.55
lbrakk
0.55
POSITIVE LOGITS
.
0.86
which
0.84
as
0.78
。
0.77
which
0.74
.
0.73
They
0.72
™.
0.72
These
0.71
.*
0.71
Activations Density 0.008%