INDEX
Explanations
references to specific categories or sectors, particularly related to art or cultural expressions
New Auto-Interp
Negative Logits
alist
-0.16
yll
-0.16
ilty
-0.15
aggable
-0.15
IGGER
-0.15
furt
-0.15
ely
-0.15
atatype
-0.15
enance
-0.14
altung
-0.14
POSITIVE LOGITS
leg
0.20
lege
0.19
lect
0.19
loor
0.18
LECT
0.17
_wheel
0.17
abor
0.17
minh
0.16
league
0.16
cas
0.16
Activations Density 0.011%