INDEX
Explanations
occurrences of the word "which."
New Auto-Interp
Negative Logits
a
-0.66
e
-0.60
age
-0.59
ed
-0.58
i
-0.57
'
-0.56
P
-0.55
C
-0.55
ee
-0.55
cy
-0.55
POSITIVE LOGITS
]**
0.87
we
0.86
soever
0.85
تقاوى
0.84
means
0.80
they
0.77
]--;
0.76
]+"
0.75
RTLD
0.75
"]}
0.73
Activations Density 0.167%