INDEX
Explanations
nouns that signify significant actions or effects
New Auto-Interp
Negative Logits
Tome
-0.17
ongan
-0.16
ilan
-0.15
-ÐĽ
-0.14
_lv
-0.14
RAP
-0.14
ruba
-0.14
.timeScale
-0.14
otos
-0.14
ë¦
-0.14
POSITIVE LOGITS
ses
0.17
.sam
0.16
pling
0.15
ele
0.15
-fetch
0.14
Pitt
0.14
rott
0.14
kari
0.14
Fro
0.14
fro
0.14
Activations Density 0.010%