INDEX
Explanations
names of famous individuals
words related to occupations or job roles
New Auto-Interp
Negative Logits
è£ħ
-0.79
Magikarp
-0.77
CLOSE
-0.76
REDACTED
-0.73
Reviewer
-0.73
Interstitial
-0.72
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.71
é¾
-0.70
xual
-0.67
eers
-0.67
POSITIVE LOGITS
ford
0.88
imar
0.87
endish
0.87
enna
0.83
adian
0.83
ado
0.82
adier
0.80
rero
0.80
ony
0.79
abit
0.77
Activations Density 0.184%