INDEX
Explanations
names and surnames of people
names and references to specific individuals or entities
New Auto-Interp
Negative Logits
osuke
-0.76
Azerb
-0.69
ourced
-0.65
IDS
-0.60
uthor
-0.60
ettle
-0.59
ADRA
-0.59
é¾
-0.58
obos
-0.57
cogn
-0.57
POSITIVE LOGITS
ts
1.58
t
1.51
tes
1.39
te
1.32
ta
1.27
ty
1.26
ted
1.23
tin
1.20
tt
1.20
ting
1.18
Activations Density 0.231%