INDEX
Explanations
punctuation marks and specific formatting symbols
New Auto-Interp
Negative Logits
2
-0.74
1
-0.71
3
-0.69
T
-0.67
7
-0.61
8
-0.61
y
-0.61
Tur
-0.61
9
-0.60
tur
-0.60
POSITIVE LOGITS
]")]
1.59
__":
1.57
__':
1.42
endphp
1.38
__":
1.35
.],
1.31
__':
1.29
,",
1.26
.)}
1.26
."],
1.26
Activations Density 0.211%