INDEX
Explanations
phrases related to a progress of time or degree
phrases indicating ongoing situations or conditions
New Auto-Interp
Negative Logits
rium
-0.70
vec
-0.66
murd
-0.63
SourceFile
-0.60
illian
-0.59
Morning
-0.58
Killing
-0.58
}}}
-0.57
llular
-0.57
vation
-0.57
POSITIVE LOGITS
unsuccessful
0.90
hasn
0.79
unsuccessfully
0.77
haven
0.75
successful
0.75
satisfactory
0.69
nobody
0.69
indications
0.65
there
0.64
we
0.64
Activations Density 0.039%