INDEX
Explanations
adverbs and quantifying phrases
phrases that indicate universality or generality
New Auto-Interp
Negative Logits
ubi
-0.71
uel
-0.69
aline
-0.65
Dull
-0.64
clus
-0.63
Dod
-0.61
poke
-0.60
Bard
-0.60
bek
-0.60
uers
-0.60
POSITIVE LOGITS
imaginable
0.86
hyde
0.79
STD
0.74
conceivable
0.74
partName
0.68
WAYS
0.68
代
0.67
WHERE
0.66
itsu
0.65
Ùĩ
0.65
Activations Density 0.073%