INDEX
Explanations
personal bios or descriptions of individuals
New Auto-Interp
Negative Logits
aris
-0.16
12
-0.14
cla
-0.13
1
-0.13
[]
-0.13
Dart
-0.13
bob
-0.13
ан
-0.13
endowed
-0.12
el
-0.12
POSITIVE LOGITS
uali
0.17
åĨĻ
0.15
pcs
0.15
KHTML
0.14
usercontent
0.14
ertiary
0.14
atk
0.14
ê°Ŀ
0.14
_Write
0.14
writ
0.14
Activations Density 0.072%