INDEX
Explanations
phrases and questions centered around opinions and thoughts
asking for opinion
New Auto-Interp
Negative Logits
disambiguazione
-0.59
ंदीखरीदारी
-0.50
astify
-0.48
оригіналу
-0.47
estekak
-0.47
Roskov
-0.46
Diweddarwch
-0.46
mobileqq
-0.46
tartalomajánló
-0.45
esclavos
-0.45
POSITIVE LOGITS
think
0.63
opinion
0.58
opinion
0.58
Opinion
0.53
Opinion
0.50
imagin
0.50
think
0.50
opinions
0.49
Critique
0.48
proportion
0.47
Activations Density 0.008%