INDEX
Explanations
specific nouns and their associated actions or statuses in various contexts
New Auto-Interp
Negative Logits
Äįel
-0.17
stains
-0.15
ie
-0.15
ies
-0.15
iral
-0.14
gain
-0.14
_references
-0.14
Wed
-0.14
èĨ
-0.14
Äįer
-0.14
POSITIVE LOGITS
ardy
0.17
rire
0.17
rale
0.17
ifestyles
0.15
Gilbert
0.14
aroo
0.14
682
0.14
ury
0.14
Fury
0.14
errupted
0.14
Activations Density 0.049%