INDEX
Explanations
references to people or groups and their actions or characteristics
New Auto-Interp
Negative Logits
adow
-0.16
ugged
-0.15
ulta
-0.15
elligent
-0.15
usercontent
-0.15
uster
-0.15
ulton
-0.15
quare
-0.15
urent
-0.14
atype
-0.14
POSITIVE LOGITS
re
0.28
ve
0.22
particular
0.17
614
0.16
cco
0.16
ves
0.16
Pie
0.16
ave
0.15
shell
0.15
ll
0.15
Activations Density 0.027%