INDEX
Explanations
numeric values and percentages within text
New Auto-Interp
Negative Logits
eg
-0.16
opper
-0.15
esp
-0.15
دÙĪØ¯
-0.14
whose
-0.14
whose
-0.14
lore
-0.14
totaling
-0.14
áh
-0.14
chter
-0.13
POSITIVE LOGITS
compared
0.24
far
0.22
far
0.20
equivalent
0.19
ãģĨãģ¡
0.19
urette
0.19
down
0.18
enough
0.18
represent
0.17
equ
0.17
Activations Density 0.106%