INDEX
Explanations
words related to specific individuals or identifiers
New Auto-Interp
Negative Logits
ixa
-0.16
ixer
-0.16
lds
-0.15
ousse
-0.15
ij
-0.15
lep
-0.15
ÑĥÑĪки
-0.14
IE
-0.14
ãģĹãģı
-0.14
Ñĥж
-0.14
POSITIVE LOGITS
above
0.20
Above
0.18
above
0.18
Above
0.18
ABOVE
0.16
Unified
0.15
rror
0.15
BeÅŁ
0.15
ÑĤоÑĢ
0.15
foregoing
0.14
Activations Density 0.142%