INDEX
Explanations
proper nouns indicating people or places
the instances of the letter 'o' in various contexts
New Auto-Interp
Negative Logits
advertisement
-0.71
maid
-0.66
heartbeat
-0.59
imaru
-0.58
racuse
-0.58
noses
-0.57
gd
-0.57
dstg
-0.57
ron
-0.56
è¦ļéĨĴ
-0.56
POSITIVE LOGITS
ff
0.66
aff
0.63
za
0.63
elt
0.62
efer
0.61
apan
0.60
let
0.60
ort
0.59
iture
0.58
Aff
0.58
Activations Density 0.133%