INDEX
Explanations
phrases that indicate frequency or probability
phrases that indicate frequency or typicality in actions or statements
New Auto-Interp
Negative Logits
andering
-0.75
imm
-0.71
enting
-0.67
driving
-0.64
çĶ
-0.62
rising
-0.61
gold
-0.61
ä½ľ
-0.60
ander
-0.59
å®
-0.59
POSITIVE LOGITS
entimes
1.22
terness
1.02
etheless
0.97
yip
0.95
veyard
0.91
eus
0.85
Called
0.85
ccording
0.84
asionally
0.83
nomine
0.81
Activations Density 0.012%