INDEX
Explanations
phrases with colons followed by an emphasis or highlighting
colons or specific segments that present detailed or categorical information
New Auto-Interp
Negative Logits
ividual
-0.72
mercial
-0.60
itially
-0.57
uably
-0.56
sbm
-0.56
oided
-0.55
sembly
-0.55
athered
-0.55
ancial
-0.54
olicy
-0.54
POSITIVE LOGITS
:
2.17
!:
1.53
:[
1.51
:"
1.50
:-
1.49
:(
1.48
?:
1.46
.:
1.46
*:
1.45
:'
1.40
Activations Density 0.165%