INDEX
Explanations
words related to authority figures or institutions
the repeated use of the word "by" in various contexts
New Auto-Interp
Negative Logits
atem
-0.65
idate
-0.63
itto
-0.63
allo
-0.60
asy
-0.58
ettes
-0.56
Saharan
-0.56
abul
-0.55
ati
-0.54
chuk
-0.54
POSITIVE LOGITS
virtue
1.02
laws
0.83
products
0.83
fiat
0.67
akuya
0.66
product
0.66
catch
0.65
gone
0.64
multiplying
0.63
STATS
0.60
Activations Density 0.127%