INDEX
Explanations
instances of the word "know" signaling awareness or recognition
New Auto-Interp
Negative Logits
ingen
-0.17
_superuser
-0.16
wire
-0.15
pa
-0.15
yah
-0.14
.Unity
-0.14
owl
-0.14
pa
-0.14
wine
-0.14
Sug
-0.14
POSITIVE LOGITS
irsch
0.20
ãĥĥãĥĦ
0.16
odon
0.16
iland
0.15
ooth
0.15
ionales
0.15
uis
0.15
enas
0.15
.once
0.14
Ñģо
0.14
Activations Density 0.033%