INDEX
Explanations
messages or communication-related words
references to messages being sent or received
New Auto-Interp
Negative Logits
culus
-0.74
riages
-0.74
Simulator
-0.64
stagn
-0.62
stagnant
-0.61
Brune
-0.61
peas
-0.60
liberties
-0.58
ackets
-0.58
Railway
-0.58
POSITIVE LOGITS
mails
0.76
0.72
greets
0.72
greeted
0.71
voic
0.70
greeting
0.69
memos
0.69
plain
0.69
0.67
messages
0.67
Activations Density 0.501%