INDEX
Explanations
references to specific individuals or agents within a context
New Auto-Interp
Negative Logits
ształ
-0.81
CloseOperation
-0.80
دانشنامهٔ
-0.77
nakalista
-0.76
存于互联网档案馆
-0.71
Audiodateien
-0.68
tagez
-0.68
itſelf
-0.67
édrale
-0.67
Viki
-0.67
POSITIVE LOGITS
gar
0.66
hol
0.66
vi
0.63
th
0.61
gen
0.60
ris
0.60
rd
0.60
dro
0.60
gra
0.59
ca
0.58
Activations Density 0.805%