INDEX
Explanations
phrases starting with "Based on"
phrases starting with "Based on," indicating references to evidence or sources
New Auto-Interp
Negative Logits
shr
-0.70
istors
-0.70
ivering
-0.67
apo
-0.67
adra
-0.67
dimension
-0.66
icut
-0.65
imm
-0.65
arthy
-0.65
avez
-0.64
POSITIVE LOGITS
tesy
0.78
loosely
0.77
¥
0.75
ccording
0.72
awaru
0.72
Based
0.67
edience
0.66
citiz
0.66
veter
0.66
¶æ
0.65
Activations Density 0.014%