INDEX
Explanations
sentences or phrases starting with certain special characters followed by words or phrases related to statements or allegations
dialogue or quotes within a narrative context
New Auto-Interp
Negative Logits
minus
-0.67
steroids
-0.66
lled
-0.65
Thornton
-0.64
Hai
-0.63
derby
-0.62
bot
-0.62
abouts
-0.61
territ
-0.61
orate
-0.60
POSITIVE LOGITS
"â̦
0.99
"...
0.97
³³³³³³³³
0.96
³³³³³³³³³³³³³³³³
0.89
enegger
0.87
"[
0.86
Congratulations
0.84
Today
0.83
...]
0.82
Our
0.82
Activations Density 0.087%