INDEX
Explanations
phrases indicating frequency or repetition
phrases indicating frequency or universality
New Auto-Interp
Negative Logits
mire
-0.65
poke
-0.64
ubi
-0.62
omics
-0.62
Jedi
-0.60
Dod
-0.60
angelo
-0.60
clus
-0.58
ati
-0.58
Bard
-0.58
POSITIVE LOGITS
hyde
0.78
imaginable
0.72
WAYS
0.71
consist
0.68
STD
0.68
xual
0.66
occupations
0.64
conceivable
0.63
ricular
0.63
body
0.63
Activations Density 0.073%