INDEX
Explanations
phrases that list categories or options
phrases indicating the classification or categorization of concepts and reasons
New Auto-Interp
Negative Logits
ergy
-0.74
Beast
-0.72
Ire
-0.71
bats
-0.71
Andromeda
-0.68
uddin
-0.66
ebook
-0.64
Downloadha
-0.62
istant
-0.62
lator
-0.59
POSITIVE LOGITS
viz
1.08
%:
1.02
:-
1.00
):
0.85
simultaneously
0.85
namely
0.83
:
0.82
:(
0.80
:#
0.78
:"
0.75
Activations Density 0.132%