INDEX
Explanations
proper nouns and names, potentially related to investigations or scandals
the letter 't'
New Auto-Interp
Negative Logits
thous
-0.77
conclud
-0.77
[*
-0.71
Vaugh
-0.71
ãĥ¼ãĥĨ
-0.68
proport
-0.66
vulner
-0.65
ingred
-0.63
ModLoader
-0.62
detrim
-0.61
POSITIVE LOGITS
zeb
0.68
imo
0.67
mt
0.67
ice
0.64
gallery
0.62
letters
0.61
ush
0.61
info
0.61
amin
0.61
oba
0.61
Activations Density 0.074%