INDEX
Explanations
passive forms of communication where information is shared or exchanged
instances of the phrase "I was told" or similar constructions indicating received information or directives
New Auto-Interp
Negative Logits
adesh
-0.72
ivot
-0.71
Labor
-0.69
ADD
-0.62
ications
-0.62
insula
-0.61
odox
-0.60
limb
-0.59
asu
-0.59
aband
-0.59
POSITIVE LOGITS
tale
0.86
mith
0.74
goodbye
0.71
tales
0.69
permission
0.69
Ń·
0.68
ļé
0.66
repeatedly
0.65
esson
0.65
ĨĴ
0.64
Activations Density 0.031%