INDEX
Explanations
phrases indicating ease or complications related to processes or actions
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.10
3:0.39
4:0.06
5:0.04
6:0.02
7:0.04
8:0.05
9:0.08
10:0.08
11:0.06
Negative Logits
xit
-1.78
orum
-1.77
lez
-1.76
inburgh
-1.66
isode
-1.65
INGTON
-1.60
ican
-1.59
psons
-1.58
claw
-1.57
�
-1.57
POSITIVE LOGITS
perce
1.87
inherently
1.84
technically
1.70
DERR
1.65
notoriously
1.64
nature
1.64
poorly
1.63
physically
1.63
thumbnails
1.57
finite
1.54
Activations Density 0.894%