INDEX
Explanations
proper nouns related to individuals or names
references to specific names and terms associated with individuals or concepts
New Auto-Interp
Negative Logits
ãĤª
-0.81
æµ
-0.80
ãĤ¶
-0.74
Privacy
-0.72
DEV
-0.72
ãĤ¹ãĥĪ
-0.69
Benz
-0.69
Instruments
-0.68
Seal
-0.67
Cities
-0.65
POSITIVE LOGITS
ews
0.85
lines
0.81
aturally
0.77
orter
0.77
uala
0.73
likeness
0.73
otle
0.73
Lago
0.72
leted
0.72
graded
0.71
Activations Density 0.020%