INDEX
Explanations
that introduces descriptive clauses
New Auto-Interp
Negative Logits
not
0.58
doesn
0.57
isn
0.52
nicht
0.51
unsure
0.50
hasn
0.49
wasn
0.47
guien
0.47
neither
0.44
lacks
0.44
POSITIVE LOGITS
Grundlage
0.52
столь
0.50
accompagne
0.50
underlie
0.50
গৃহে
0.49
underpin
0.48
underlies
0.47
služ
0.46
pourtant
0.46
бу
0.45
Activations Density 0.055%