INDEX
Explanations
references to cultural elements or practices
New Auto-Interp
Negative Logits
OLON
-0.17
NAL
-0.16
Heath
-0.16
alar
-0.15
esel
-0.15
ç¨
-0.15
outers
-0.15
olon
-0.15
arel
-0.14
isol
-0.14
POSITIVE LOGITS
cul
0.19
cul
0.17
owski
0.16
Lor
0.16
ope
0.16
erval
0.16
plorer
0.15
opo
0.15
ib
0.15
oyal
0.15
Activations Density 0.010%