INDEX
Explanations
words related to linguistics and language
words that describe linguistic actions or attributes
New Auto-Interp
Negative Logits
Seym
-0.70
raviolet
-0.69
citiz
-0.66
IFIED
-0.66
ij士
-0.64
mathemat
-0.61
intervening
-0.59
ĻĤ
-0.58
schild
-0.55
nomine
-0.54
POSITIVE LOGITS
bian
1.00
worth
0.99
ttes
0.99
heed
0.95
bury
0.92
gling
0.92
ham
0.91
ength
0.89
xual
0.88
own
0.87
Activations Density 0.063%