INDEX
Explanations
phrases related to expressing uncertainty or seeking guidance
New Auto-Interp
Negative Logits
room
-0.64
oubted
-0.64
........
-0.60
odder
-0.59
piece
-0.59
peak
-0.59
iculture
-0.58
ceptions
-0.58
izu
-0.57
UM
-0.57
POSITIVE LOGITS
soever
1.04
beit
0.96
ls
0.92
much
0.91
itzer
0.88
ells
0.87
ling
0.82
ever
0.82
much
0.76
prevalent
0.74
Activations Density 0.558%