INDEX
Explanations
instances of the word "touch" and related terms
New Auto-Interp
Negative Logits
idal
-0.17
quo
-0.15
ÑĪев
-0.15
ãĥ³ãĥĸ
-0.15
ÑĢава
-0.15
issors
-0.15
anto
-0.14
ntl
-0.14
uito
-0.14
isor
-0.14
POSITIVE LOGITS
screens
0.28
screen
0.24
stone
0.24
stones
0.23
-screen
0.23
points
0.23
UpInside
0.22
pad
0.22
y
0.21
down
0.20
Activations Density 0.017%