INDEX
Explanations
examples given as evidence or explanation
phrases that introduce examples or instances
New Auto-Interp
Negative Logits
ãĤ©
-0.71
jug
-0.71
etheless
-0.69
chant
-0.66
è¦ļéĨĴ
-0.63
livion
-0.62
Ŀ
-0.59
enthusi
-0.58
aution
-0.56
ãĤ¬
-0.55
POSITIVE LOGITS
example
1.78
instance
1.63
example
1.31
say
1.24
Example
1.13
Example
1.11
for
1.02
examples
1.01
for
0.99
Examples
0.99
Activations Density 0.423%