INDEX
Explanations
the word "ve" with differing activation values, potentially related to app optimization or rentals
the presence of the suffix "ve" in words
New Auto-Interp
Negative Logits
£ı
-0.78
olicy
-0.75
GOODMAN
-0.72
assian
-0.69
matically
-0.67
artifacts
-0.66
administ
-0.65
omo
-0.65
wcs
-0.62
anmar
-0.62
POSITIVE LOGITS
illance
1.16
mber
1.13
ttes
1.02
ggie
0.98
ller
0.97
llers
0.95
lla
0.94
tsy
0.94
tted
0.94
tt
0.94
Activations Density 0.037%