INDEX
Explanations
words with high frequencies within a specific context or grammatical category
New Auto-Interp
Negative Logits
pNet
-0.16
Balt
-0.14
915
-0.14
Kinder
-0.14
_Private
-0.14
edo
-0.14
hu
-0.14
anguard
-0.14
ÚĨ
-0.13
door
-0.13
POSITIVE LOGITS
ames
0.18
abant
0.15
mesh
0.13
toISOString
0.13
pha
0.13
Whites
0.13
abit
0.13
mesh
0.13
anno
0.13
_fwd
0.13
Activations Density 0.114%