INDEX
Explanations
phrases indicating a series of examples or a list
phrases indicating the existence or presence of stories and facts
New Auto-Interp
Negative Logits
culosis
-0.78
nesday
-0.78
oire
-0.78
iliation
-0.72
etheless
-0.72
aea
-0.71
oyer
-0.70
ility
-0.70
icka
-0.69
ãĤ¨ãĥ«
-0.69
POSITIVE LOGITS
types
0.91
examples
0.89
truths
0.87
constants
0.82
facts
0.81
guys
0.80
kinds
0.79
caveats
0.78
types
0.77
topics
0.76
Activations Density 0.064%