INDEX
Explanations
instances of the word "know" and its variations
New Auto-Interp
Negative Logits
hread
-0.16
cola
-0.16
ãĥ¼ãĥĢ
-0.15
ural
-0.14
аÑĢод
-0.14
wizard
-0.14
shaw
-0.14
imenti
-0.14
apo
-0.14
min
-0.14
POSITIVE LOGITS
ledge
0.21
upp
0.21
-how
0.20
ledged
0.19
liness
0.18
estar
0.16
estic
0.16
alg
0.15
lobber
0.15
erver
0.15
Activations Density 0.142%