INDEX
Explanations
instances where contrasting elements are being compared
the phrase "while", indicating a contrast or a conditional statement
New Auto-Interp
Negative Logits
aer
-0.81
agin
-0.76
atron
-0.74
omet
-0.70
ahime
-0.70
red
-0.70
aja
-0.69
ison
-0.69
icted
-0.68
fleet
-0.68
POSITIVE LOGITS
acknowledging
1.06
researching
0.97
browsing
0.92
respecting
0.91
conced
0.90
touring
0.85
agreeing
0.85
maintaining
0.85
admitting
0.85
compiling
0.78
Activations Density 0.043%