INDEX
Explanations
instances of the word "just" and other similar simple adjectives or phrases
New Auto-Interp
Negative Logits
mlink
-0.18
.metamodel
-0.17
Rodney
-0.15
gấp
-0.15
nbsp
-0.15
_dash
-0.15
.Server
-0.14
orts
-0.14
diff
-0.14
Buk
-0.14
POSITIVE LOGITS
Cousins
0.15
ectar
0.15
auge
0.15
Misc
0.15
FY
0.15
Haus
0.15
Bookmark
0.14
hpp
0.14
ANNEL
0.14
overall
0.14
Activations Density 0.063%