INDEX
Explanations
terms related to origin and identity
New Auto-Interp
Negative Logits
aeda
-0.15
reater
-0.13
ilim
-0.13
unate
-0.13
elps
-0.12
reife
-0.12
224
-0.12
itted
-0.12
indered
-0.12
алÑĥ
-0.12
POSITIVE LOGITS
latin
0.31
deriv
0.22
substant
0.21
latina
0.21
appell
0.21
latin
0.20
suffix
0.20
latino
0.20
signific
0.20
vocab
0.20
Activations Density 0.037%