INDEX
Explanations
references to individuals in positions of authority or influence within various contexts
New Auto-Interp
Negative Logits
ãĥ«ãĤ¯
-0.15
afone
-0.14
omite
-0.13
èn
-0.13
-prepend
-0.13
ores
-0.13
çļĨ
-0.13
едак
-0.13
erties
-0.13
åĽ
-0.13
POSITIVE LOGITS
talks
0.38
shares
0.36
discusses
0.35
share
0.34
discuss
0.34
chats
0.34
reflects
0.31
talk
0.30
talk
0.30
dishes
0.29
Activations Density 0.113%