INDEX
Explanations
specific sequences of characters
occurrences of a specific symbol or character
New Auto-Interp
Negative Logits
metic
-0.69
Lumpur
-0.69
messing
-0.65
bund
-0.63
nuts
-0.62
querque
-0.61
obj
-0.61
omorphic
-0.61
tur
-0.61
creep
-0.60
POSITIVE LOGITS
feat
1.12
requ
1.08
while
1.03
where
1.01
should
0.97
stru
0.97
inducing
0.95
enough
0.93
once
0.93
among
0.93
Activations Density 0.024%