INDEX
Explanations
words related to proper names, potentially full names of individuals
occurrences of the sequence of letters 'us' or similar patterns within words
New Auto-Interp
Negative Logits
envy
-0.83
FACE
-0.61
mathemat
-0.61
ACTED
-0.61
Curiosity
-0.60
FTWARE
-0.60
ModLoader
-0.59
resolve
-0.59
Babel
-0.59
readiness
-0.59
POSITIVE LOGITS
ich
0.91
nick
0.84
eman
0.84
enberg
0.84
orst
0.83
opoulos
0.83
chuk
0.83
angan
0.83
endor
0.83
akis
0.82
Activations Density 0.315%