INDEX
Explanations
references to positions and roles within various contexts
New Auto-Interp
Negative Logits
éĪ
-0.17
nackte
-0.16
698
-0.16
anga
-0.15
stanbul
-0.14
723
-0.14
imb
-0.14
bol
-0.14
borg
-0.13
ensa
-0.13
POSITIVE LOGITS
arius
0.21
society
0.17
฿
0.16
ville
0.15
niche
0.15
arger
0.15
ertype
0.15
tright
0.14
among
0.14
among
0.14
Activations Density 0.160%