INDEX
Explanations
conditional phrases indicating hypothetical scenarios
New Auto-Interp
Head Attr Weights
0:0.05
1:0.05
2:0.12
3:0.13
4:0.10
5:0.05
6:0.05
7:0.08
8:0.08
9:0.07
10:0.08
11:0.08
Negative Logits
ndra
-1.65
:)
-1.61
kinda
-1.60
watering
-1.59
understatement
-1.57
;)
-1.55
Redditor
-1.52
spoilers
-1.50
Rowling
-1.50
considering
-1.45
POSITIVE LOGITS
atars
1.75
thood
1.67
cknow
1.60
alsh
1.54
onomous
1.52
acebook
1.47
XI
1.47
irteen
1.46
cellaneous
1.43
XV
1.41
Activations Density 0.000%