INDEX
Explanations
sentences starting with "This"
instances of the word "This."
New Auto-Interp
Negative Logits
aths
-0.78
cery
-0.75
aws
-0.73
ikers
-0.69
nets
-0.67
ricks
-0.67
unk
-0.66
oller
-0.66
ARS
-0.65
amina
-0.65
POSITIVE LOGITS
resulted
1.08
includes
1.06
means
0.99
ensures
0.97
entails
0.96
culminated
0.96
implies
0.96
prompted
0.95
latest
0.94
contradicts
0.94
Activations Density 0.149%