INDEX
Explanations
phrases related to lists of items or categories
phrases indicating quantities or amounts, often emphasizing the word "more."
New Auto-Interp
Negative Logits
lance
-0.87
Joy
-0.73
Unity
-0.70
ivism
-0.69
stadt
-0.68
heed
-0.68
POST
-0.67
BIL
-0.66
rix
-0.66
gow
-0.66
POSITIVE LOGITS
dozen
1.13
hundred
0.95
paragraphs
0.93
consecutive
0.93
sectors
0.91
layers
0.91
thousand
0.90
episodes
0.89
segments
0.89
exceptions
0.88
Activations Density 0.047%