INDEX
Explanations
phrases related to figures and diagrams
references to figures or visual aids within the text
New Auto-Interp
Negative Logits
TPPStreamerBot
-0.71
Sussex
-0.71
Kimmel
-0.68
ngth
-0.66
Barclays
-0.63
ש
-0.63
Passenger
-0.62
UGH
-0.62
ITY
-0.62
ת
-0.61
POSITIVE LOGITS
uring
1.20
ured
1.19
uration
1.18
ures
1.15
aro
1.06
urations
1.05
uer
0.96
uers
0.92
ue
0.91
ging
0.91
Activations Density 0.015%