INDEX
Explanations
positive words conveying support or encouragement
recurrent phrases and expressions emphasizing longevity or continuity
New Auto-Interp
Negative Logits
agre
-0.66
seiz
-0.65
ét
-0.62
rul
-0.62
accomp
-0.61
unden
-0.59
territ
-0.59
awa
-0.59
laun
-0.59
»
-0.59
POSITIVE LOGITS
Anti
0.58
Straw
0.57
Fire
0.56
Coca
0.56
Coffee
0.55
coffee
0.55
Burg
0.54
bully
0.53
Kelvin
0.52
icrobial
0.52
Activations Density 0.935%