INDEX
Explanations
expressions and phrases indicating requests or responses
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.04
3:0.07
4:0.15
5:0.03
6:0.05
7:0.34
8:0.03
9:0.04
10:0.07
11:0.08
Negative Logits
trenches
-1.76
ypes
-1.55
��
-1.55
vantage
-1.55
igree
-1.52
ocamp
-1.51
vironments
-1.50
groove
-1.49
kered
-1.42
ourney
-1.42
POSITIVE LOGITS
inquiries
1.92
allegations
1.78
queries
1.73
criticism
1.71
requests
1.66
inaction
1.66
petitions
1.61
inquiry
1.58
baseless
1.57
criticisms
1.55
Activations Density 0.009%