INDEX
Explanations
words that are commonly used or found
the word "commonly" in various contexts
New Auto-Interp
Negative Logits
ongyang
-0.73
ilion
-0.71
Fury
-0.70
Ole
-0.68
Equality
-0.67
jri
-0.66
utenberg
-0.66
anth
-0.65
anmar
-0.65
gur
-0.64
POSITIVE LOGITS
abbrevi
1.07
referred
1.03
entimes
1.02
categorized
0.95
seen
0.90
encountered
0.90
regarded
0.87
ensical
0.86
known
0.85
mistaken
0.85
Activations Density 0.026%