INDEX
Explanations
messages or communication-related phrases
references to messages or communications
New Auto-Interp
Negative Logits
ptions
-0.71
Zup
-0.69
chester
-0.68
iba
-0.68
borough
-0.65
unn
-0.64
lymp
-0.63
elson
-0.63
zan
-0.63
stocks
-0.63
POSITIVE LOGITS
message
3.66
message
2.65
messages
2.60
Message
2.53
Message
2.25
messaging
2.19
Messages
2.13
messenger
1.77
msg
1.48
mess
1.45
Activations Density 0.018%