INDEX
Explanations
short phrases introducing a topic or statement
instances of the word "This" and other introductory or demonstrative words
New Auto-Interp
Negative Logits
gomery
-0.62
itud
-0.60
aign
-0.60
ificial
-0.60
Cummings
-0.59
rall
-0.58
lie
-0.57
Lyon
-0.57
IDA
-0.57
INO
-0.57
POSITIVE LOGITS
itialized
0.83
cano
0.72
ĪĴ
0.62
cknowled
0.61
Started
0.61
oran
0.61
ymes
0.60
ths
0.60
geist
0.59
Own
0.59
Activations Density 0.326%