INDEX
Explanations
phrases indicating opinions or statements made by someone
New Auto-Interp
Negative Logits
ingu
-0.72
chnology
-0.67
onest
-0.65
omach
-0.62
renheit
-0.62
ardless
-0.62
ģ«
-0.62
anuts
-0.61
ablish
-0.61
isphere
-0.61
POSITIVE LOGITS
says
1.05
said
1.04
replied
1.03
tweeted
0.97
said
0.95
wrote
0.95
reads
0.93
exclaimed
0.92
remarked
0.91
commented
0.89
Activations Density 0.080%