INDEX
Explanations
positive evaluations or comments
phrases expressing affirmation or positive sentiment
New Auto-Interp
Negative Logits
pora
-0.75
gypt
-0.74
cro
-0.73
guiActiveUn
-0.72
ural
-0.68
ascript
-0.66
acan
-0.66
arij
-0.65
atari
-0.64
iless
-0.64
POSITIVE LOGITS
albeit
0.67
âĹ¼
0.67
congratulations
0.66
Reson
0.62
Brave
0.59
outweigh
0.58
fodder
0.58
insofar
0.57
albeit
0.57
hearted
0.57
Activations Density 0.561%