INDEX
Explanations
references to communication and interaction among people
New Auto-Interp
Head Attr Weights
0:0.02
1:0.01
2:0.14
3:0.18
4:0.24
5:0.02
6:0.03
7:0.11
8:0.04
9:0.03
10:0.07
11:0.05
Negative Logits
ipel
-1.68
izoph
-1.57
ensable
-1.52
icted
-1.52
?????-
-1.51
ophon
-1.46
UCT
-1.44
tained
-1.41
apter
-1.40
transpired
-1.37
POSITIVE LOGITS
differently
1.79
everyday
1.73
every
1.67
whereas
1.62
Mondays
1.61
Fridays
1.54
every
1.53
whenever
1.52
often
1.48
constantly
1.47
Activations Density 0.452%