INDEX
Explanations
phrases indicating significance or importance
New Auto-Interp
Negative Logits
ailability
-0.88
uffle
-0.66
ONSORED
-0.58
results
-0.57
Boom
-0.57
iqueness
-0.55
soared
-0.55
��
-0.55
similarities
-0.55
PASS
-0.55
POSITIVE LOGITS
arat
0.76
,
0.74
dstg
0.69
UNCLASSIFIED
0.64
�
0.61
–
0.60
Guant
0.60
?:
0.58
les
0.58
ional
0.57
Activations Density 0.055%