INDEX
Explanations
variations of the word "opinion."
New Auto-Interp
Negative Logits
Friends
-0.17
ymoon
-0.17
Friends
-0.16
ocked
-0.16
ordin
-0.16
accent
-0.15
ynchron
-0.15
urance
-0.15
Beam
-0.15
friends
-0.15
POSITIVE LOGITS
inions
0.30
ulent
0.27
inion
0.26
POSITE
0.24
portunity
0.24
encv
0.23
ulence
0.23
posite
0.21
ération
0.21
ium
0.20
Activations Density 0.013%