INDEX
Explanations
text written in a specific format: a colon followed by a statement or message
indicatives of written communication, such as quotes or citations
New Auto-Interp
Negative Logits
principals
-0.70
adversaries
-0.70
reconc
-0.69
territ
-0.67
stride
-0.65
undermin
-0.64
glac
-0.62
spills
-0.61
visitation
-0.61
comprom
-0.61
POSITIVE LOGITS
âĨij
1.25
Originally
1.08
Show
1.05
Originally
1.04
Quote
1.03
Quote
0.99
Hmm
0.98
Hi
0.97
Hello
0.95
Hey
0.95
Activations Density 0.061%